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L  THE  DISORDER  AND  CHANGE  DET  ECT  ION  PROBLEM 

FORMULATION— AN  OVERVIEW 

A.  INTRODUCTION 

This  dissertation  presents  sequential  decision  methods  both  in  the  non- 
Bayesian  (maximum  likelihood)  framework  and  in  the  Bayesian  framework. 
The  focus  is  mainly  on  non-Bayesian  methods,  where  the  goal  is  to  detect,  as 
quickly  as  possible,  changes  in  statistical  models  of  a  random  process  when 
these  changes  can  occur  at  a  random  time,  while  the  false  alarm  rate  should 
be  lower  bounded  by  some  given  constant. 

In  the  classical  detection  framework  such  procedures  were  considered  by 
Wald  (Wald,  1947),  for  which  the  binary  hypothesis  framework  v/as 
developed  under  the  assumption  was  that  all  the  observations  come  from 
one  model  or  from  an  alternative  one.  It  was  not  until  Page's  work  (Page, 
1954)  in  the  non-Bayesian  framework  and  Shiryayev  (Shiryayev,  1961,  1963, 
1965)  in  the  Bayesian  framework  that  the  problem  was  extended  to  detecting  a 
change  from  one  statistical  model  to  a  second  model.  Lorden  (Lorden,  1971) 
showed  that  the  cumulative  sum  tests  as  proposed  by  Page  are  asymptotically 
optimal  when  the  mean  time  between  false  alarm  tends  to  infinity,  in  the 
sense  of  minimizing  the  average  delay  time  for  detection.  Recently,  Poliak 
(Poliak,  1985)  proved  an  optimality  property  for  the  Shiryayev  rule. 

Two  types  of  problems  depend  on  the  time  element.  The  first  is  the 
disorder  problem  in  which  the  given  observations  correspond  to  one 
statistical  model  until  some  unknown  time  after  which  the  samples 
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correspond  to  another  statistical  model.  Hereby  we  will  use  the  notations 
disorder  and  change  as  synonyms,  even  though  a  disorder  is  referred  to  as  a 
general  change  in  density  which  describes  the  change  in  the  statistical 
behavior  of  the  model,  a  change  will  refer  most  of  the  time  to  changes  in 
specific  parameters  like  mean  variance,  etc.  The  second  problem  is  the 
transient  problem  in  which  the  disorder  decays  after  some  time.  In  this 
dissertation  we  will  focus  only  on  the  disorder  (change)  problem. 

When  a  disorder  occurs,  the  random  variables  we  are  concerned  with  are 
the  change  time  and  the  model  parameters  after  the  change.  As  will  be 
presented  throughout  this  dissertation,  the  detection  process  refers  to 
detecting  the  change  as  quickly  as  possible  while  ensuring  infrequent  false 
alarms,  while  the  estimation  process  refers  to  estimating  the  change  time  and 
the  model  parameters  after  the  change.  This  dissertation  focuses  on  the 
detection  element.  The  problem  of  joint  estimation  of  the  change  time  and 
the  model  parameters  is  also  addressed  and  shown  to  appear  in  an  explicit 
closed  form  in  certain  cases. 

The  question  of  where  do  change  detection  problems  occur  is  next 
introduced.  Three  typical  situations  in  which  change  detection  is  a  critical 
component  are  considered.  The  first,  in  which  the  detection  is  used  to 
produce  alarms  during  the  monitoring  of  dynamical  systems,  such  as  failures 
in  sensors  (Willsky,  1976,1986),  detection  of  tsunamis  and  earthquake 
prediction  (Nikiforov,  1986),  and  detection  of  production  failures  (Assaf  and 
Ritov,  1988).  Many  more  applications  in  industrial  and  military 
environments  can  be  considered.  Survey  papers  for  fault  detection  methods 
are  given  by  Isermann  (Isermann,  1984),  and  Gertler  (Gertler,  1988)  The 
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second  situation  arises  in  the  area  of  adaptive  algorithms,  were  the  presence 
of  abrupt  non-stationarities  in  the  signal  causes  severe  errors  in  adapting  the 
gains  of  the  recursive  algorithms.  Thus,  an  abrupt  change  detection 
procedure  is  needed  to  improve  the  tracking  capability  of  the  algorithm.  For  a 
complete  survey  see  Ljung  and  Gunnarsson  (Ljung  and  Gunnarsson,  1990). 

Finally,  the  third  type  of  application  occurs  when  the  change  detection 
algorithm  is  considered  as  an  integral  part  of  the  modeling  of  a  signal  or  a 
system.  The  most  popular  applications  are  segmentation  of  speech  signals 
using  switching  parameter  methods  within  AR  models  (Andre-Obrecht,  1988) 
or  various  geophysical  signals  (Nikiforov,  1986).  In  such  cases  switching 
methods  within  the  transition  matrix  of  state-space  models  (Tugnait,  1986),  or 
a  modified  Kalman  filter  is  used  to  cope  with  changes  modeled  as  abrupt 
transitions  in  the  measurement  matrices  (Shumway,  1990).  Also,  the 
problem  of  outlier  detection  by  modifying  the  Kalman  filter  was  introduced 
by  Pena  and  Guttman  (Pena  and  Guttman,  1988). 

B.  THE  DISORDER  PROBLEM  FORMULATION 

1.  The  General  Disorder  Problem 

The  change  detection  problem  is  presented  within  the  hypothesis 
testing  framework,  thus,  requiring  some  statistical  knowledge  about  the  tested 
hypotheses  which  in  turn  are  based  upon  statistical  models  of  the  hypotheses 
before  and  after  the  disorder.  The  model  based  framework  is  rich  enough  to 
serve  as  a  basis  for  the  problem  formulation,  resulting  in  parametric  type 
tests.  As  it  will  be  presented  later,  certain  types  of  change  detection 
procedures  known  as  cumulative  sum  or  cumsum  procedures  are  able  to 
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cope  with  the  parametric  and  nonparametric  forms  as  well.  Within  this 
framework  four  types  of  change  detection  problems  will  be  considered. 

Let  Ho  and  H\  he  the  two  (simple)  hypotheses,  corresponding  to  two 
possible  probability  distributions  Pq  and  Pi  on  the  observation  space  x.  If  a 
parametric  notation  is  to  be  used,  then  the  notation  P(x  I  0o)  arid  P(x  I  ^i)  or 
Po(x)  and  Pi(a:)  will  be  used.  The  observations  Xi,X2,  ...  are  assumed  to  be 
independent  random  variables. 

Type  1:  Classical  Binary  Hypothesis  Testing 

This  problem  was  considered  by  Wald  (Wald,  1947)  aind  can  be  written  as: 

Hq;  X  ~  Po, 

versus  (1-1) 

Hi:  a: -Pi, 

where  the  notation  "x  -  P"  denotes  the  condition  that  x  has  distribution  P.  In 
this  problem  there  is  no  time  index,  hence,  no  direct  formulation  of  a  change. 
Type  2:  Disorder  Formulation 

This  problem  was  considered  by  Page  (Page,  1954)  and  can  be  presented  in 
following  manner.  Let  v  be  the  unknown  time  when  the  change  from  Pq  to 
Pi  occurred.  Let  Py  denote  the  probability  when  the  change  occurred  at  the 
observation.  Let  Pq  denote  the  probability  there  is  no  change,  i.e.,  v  =  «. 

The  problem  can  be  presented  as 

Hq:  x^,X2,...  -  Pq  no  change 

versus 
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Hy.  Xl,X2,...,Xy_1 


change  at  time  v 


(1-2) 


-fii 


~  ^1- 


If  the  observation  record  is  finite  and  equal  to  say  s,  the  detection  problem 
becomes  a  multiple  hypothesis  testing,  since  the  test  "looks"  for  at  least  one  of 
the  Hy  (1  <  V  <  s)  to  hold  against  Hq. 

Type  3:  Transient  and  outliers  formulation: 

Consider  two  change  times  v  and  r  such  that 


Ho; 

Xi,X2,... 

~Po 

versus 

Hi: 

Xl/  X2j  . . .,  Xy-\ 

-Po 

Xy,  Xy-l,  .  .  .,  Xj-l 

~P1 

Xv  Xr-i, ... 

-Po- 

(1-3) 


The  same  arguments  about  composite  testing  can  be  applied  here.  Notice  that 
this  framework  can  be  extended  to  the  so-called  multiple  disorder  problem,  in 
which  the  observations  x^,  Xj+i, ~  P2  (P2  being  another  probability  density  on 
the  observation  space). 

Type  4:  Initial  Condition  Disruption 

For  model  based  detection  schemes  based  upon  state-space,  ARMA,  etc.,  the 
initial  condition  is  a  part  of  the  statistical  model.  Hence,  besides  the  ordinary 
way  to  model  the  statistical  c^iange  as  a  change  from  Po  to  P^,  a  certain  class  of 
changes  can  be  modeled  as  a  result  from  changes  in  the  initial  condition. 
This  problem  is  also  time  related  since  the  change  might  occur  at  an 
unknown  time. 
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Once  Hi  is  decided,  i.e.,  disorder  detected,  further  questions  arise,  sudr 
as  estimating  the  change  ame  v,  possibly  to  estimate  6j  and  6i,  and  in  some 
cases  to  diagnose  which  type  of  change  actually  occurred.  Thus,  the  detection 
and  estimation  following  the  detecHon  problems  being  two  separate  issues 
can  be  coupled,  but  it  is  important  to  distinguish  between  them. 

Both  off  line  (n  fixed)  and  on-line  (n  growin^'^  algorithms  can  be 
designed  for  solving  such  types  of  problems,  a  ed  as  shown  in  the  sequel  differ 
substantially,  both  from  the  change  defection  formulation  and  from  the 
performance  evaluation  point  of  view’. 

2.  Solution  Methods 

The  solution  for  such  problems  is  a  function  of  several  factors. 

a-  Off-line  versus  On-line  Tests 

In  the  off-line  formulation,  a  given  finite  record  is  given 
xi,  X2, XT  and  a  test  statistic  gj  =g(x-i,  X2, ...,  Xj)  >  A  has  to  decide  whether  or 
not  the  change  occurred.  In  the  on-line  formulation,  the  test  statistic  gt  = 
g(xi,  X2,  ...,  Xt)  ^  A  has  to  reach  a  decision  the  first  time  when  gt  exceeds  a 
threshold  A. 

b.  Criterion 

For  the  classical  detection  problem  (1-1),  the  criteria  in  the  s^nse 
of  Neyman-Pearson  (Ghosh,  1970),  is  based  on  a  test  w’hich  maximizes  the 
power  or  the  probability  of  detection  (the  probability  of  deciding  H]  when  H] 
is  actually  true)  subject  to  the  constraint  that  the  size  or  the  false-alarm 
probability  (the  probability  of  deciding  Hi  when  Hq  is  true)  is  less  than  or 
equal  to  a  given  value. 
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As  seen  from  equation  (1-2),  in  the  off-line  framework,  the 
change  detection  problem  involves  multiple  hypotheses  testing,  lor  which 
the  Neyman-Pearson  lemma  is  not  valid  (Ghosh,  1970).  Therefore  the  test  in 
this  case  cannot  be  defined  as  one  of  maximizing  the  power  since  H\  is  not 
reduced  to  a  simple  distribution  but  a  set  of  distributions.  In  such  cases,  the 
best  property  for  a  test  is  said  to  be  Uniformly  Most  Powerful  (UMP),  i.e.,  tests 
which  have  the  highest  detection  probability  for  each  distribution  of  the 
alternative  hypotheses  Hi.  Therefore  no  UMP  tests  exist  for  change  detection 
problems.  In  this  case,  those  UMP  properties  can  be  recovered  by  using 
asymptotic  analysis  (Deshayes  and  Picard,  1986).  In  order  to  cope  with  the 
performance  analysis  of  test  statistic  functions  the  following  definition  is 
needed. 

Definition:  Stopping  time.  Let  Xi,  xz,  ...  be  the  sequence  of  independent 
random  variables.  The  nonnegative  integer  valued  random  variable  N  is 
said  to  be  a  stopping  time  for  the  sequence  if  the  event  [N  =  n]  is  independent 
of  Xfj-t-},  Xn+Zr  ••••  D 

Hence,  the  event  [N  =  nj  corresponds  to  stopping  after  having  observed 
xi, ...  x„  and  thus  must  be  independent  of  the  values  of  the  random  variables 
yet  to  come  (Ross,  1989). 

For  on-line  processing,  the  criteria  is  modified.  Notice  that  by 
using  the  formulation  (1-2)  for  a  large  enough  number  of  observations,  the 
change  will  be  detected  with  probability  one.  Thus,  a  natural  criterion  should 
be  the  delay  for  detection,  subject  to  the  constraint  that  the  size  of  the  test  is 
upper  bounded  by  a  given  threshold  (Page  1954,  Shiryayev  1963).  Lorden 
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(1971)  and  Nikiforov  (1*^83)  use  a  slightly  different  version  of  the  delay  for  the 
on-line  problem: 

Let  Sn  denote  the  test  statistic  at  time  n.  Let  N  be  the  stopping 
rule,  and  let  An  be  a  generalized  threshold.  Then: 

N  =  inf{«:  Sn  >  An)  (1-4) 

defines  the  stopping  rule  and  stopping  time.  See  Figure  1.1. 


Figure  1.1.  General  Characteristics  of  the  Detection  Model.  The  observation 
sequence  Ixn)  is  transformed  into  a  sequence  [S„].  A  change  in  the  model 
structure  of  (xn)  results  in  a  cumulative  departure  of  (Sn).  The  change  is 
detected  by  comparison  of  (Sn)  with  a  generalized  threshold  {A„} 

(from  Segen  and  Sanderson,  1980). 
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The  worst  case  average  delay  D  (Lorden,  1971)  is  defined  by 

D  =  e|n|Hi|  =  su^esssupEv|(N  -  v  +  l)‘^|a:i,X2,...,Xv,_i|  (1-5) 

where  (fl)+  =  max(O^), 

where  Ev  denotes  “^he  expectation  of  the  change  time  under  the  probability 
law  Py,  where  Py  denotes  the  distribution  of  the  sequence  x\,X2,  under 
which  Xy  is  the  first  term  with  distribution  Pi.  In  other  words,  D  is  the 
smallest  value  such  that  for  any  v  =  1,  2, ... 

Ey|(N  -  V+l)^|Xl,X2,...,Xy_i|  <  D 

meaning  that  this  "minimax"  type  criterion  defines  the  best  worst  case  for 
delay. 

Thus,  the  criteria  is  defined  in  terms  of  the  quickest  detection  of  a 
change  subject  to  the  constraint  that  the  size  of  the  test  is  upper  bounded,  i.e., 
the  desire  fc  r  large  mean  time  between  false  alarms  T,  where  T  is  also  defined 
i\  terms  of  the  stopping  time 

T  =  F{N|Ho}  (1-6) 

which  denotes  the  expectation  under  the  no-change  hvpothesis  Hq-  The  pair 
(7,D)  will  specify  the  performanc_  of  a  given  algorithm. 

Notice  that  in  the  transient  or  multiple  disorder  setting  of  the 
equation  (1-3),  a  fast  detection  is  necessary  smce  if  t-v+1<D  the  transient 
cannot  be  detected. 

Thus,  for  the  on-line  framework,  this  natural  criterion  should  lead  to 
the  optimal  stopping  rule,  and  the  question  that  arises:  are  there  test  statistics 
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which  are  optimal  in  that  sense?  A  positive  answer  will  be  presented  in  the 
sequel. 

Different  types  of  criteria  can  be  used  for  deriving  optimal  stopping 
times  for  change  detection,  see  Bojdecki  and  Hosza  (Bojdecki  and  Hosza,  1984) 
and  Pelkowitz  (Pelkowitz,  1987). 

For  the  off-line  problem,  this  question  is  more  difficult,  because  as 
was  shown  in  equation  (1.2),  change  detection  problems  are  multiple 
hypotheses  problems  for  which  there  exists  no  optimum  test  in  the  classical 
sense  of  power,  (Neyman-Pearson  lemma),  hence,  no  UMP  tests  exist.  In 
such  situations,  an  asymptotic  analysis  for  which  UMP  tests  can  be  recovered 
is  of  interest.  Deshayes  and  Picard  (Deshayes  and  Picard,  1986)  showed  that 
UMP  tests  exist  for  likelihood-oriented  methods  in  the  sense  of  large 
deviation  asymptotic  analysis.  (Sample  size  goes  to  infinity.) 

c.  Optimal  Stopping  Rules 

The  off-line  point  of  view  was  addressed  in  the  last  section  where 
it  was  shown  that  optimality  exists  only  in  the  sense  of  asymptotic  analysis. 
For  the  on-line  point  of  view,  in  the  non-Bayesian  framework,  the  only 
optimality  results  are  given  by  Shiryayev  and  Lorden.  Lorden  (Lorden,  1971) 
showed  that  for  some  constant  y,  the  stopping  rule  N  must  satisfy; 

£o{N)  =  E{N|v  =  -)S)'. 

The  speed  in  which  a  stopping  rule  detects  a  (true)  change  of  distribution  is 
evaluated  by  (1-5) 

supesssupE^,[(N-  v  +  l)'^lX],X2,...,x^_l|. 
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Lorden  showed  that  a  certain  class  of  stopping  rules  is  asymptotically  (y->oo) 
optimal,  and  that  the  cumsum  procedure  (Page's  test  which  can  be  described 
as  repeated  sequential  tests)  belongs  to  this  class. 

In  the  Bayesian  framework,  Shiryayev  (Shiryayev,  1968,  1978) 
solved  the  problem.  He  considered  a  cost  function  whereby  one  loses  one 
unit  if  N<v,  and  loses  c  units  for  each  observation  taken  after  v  if  N>  v.  The 
prior  on  v  is  assumed  to  be  geometric.  Shiryayev  showed  that  the  stopping 
rule  prescribes  stopping  as  soon  as  the  posterior  probability  of  the  change 
having  occurred  exceeds  a  fixed  level. 

d.  Use  of  Prior  Knowledge 

For  change  detection  problems,  prior  knowledge  can  be  useful  in 

two  cases: 

The  first  case  is  related  to  the  problem  of  estimating  the  change 
time  after  detection.  From  the  Bayesian  point  of  view,  the  knowledge  of  the 
statistical  nature  of  change  time  makes  up  the  prior  needed  for  such  a  test. 
Such  knowledge  on  the  distribution  of  the  change  time  (or  initial  conditions) 
will  assist  in  the  quickest  delay  detection,  i.e.,  estimation  of  the  time  change. 
In  the  non-Bayesian  approach  this  is  equivalent  to  assuming  a  uniform  prior 
distribution  over  the  observation  set,  resulting  in  a  detector  which  computes 
the  likelihood  function  for  all  possible  disorder  times. 

The  second  case  is  the  estimation  following  detection  of  the 
statistical  model  after  the  change  of  the  parameter  set  6i.  For  test  procedures 
implemented  on  line,  the  use  of  prior  knowledge  on  the  parameters  set  6o, 

€  S  improves  the  quickest  detection  since  in  such  situations,  only  a  short 
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sample  is  available  from  the  true  change  time  to  the  detection  time,  thus,  it  is 
difficult  to  identify 

In  this  context,  we  shall  consider  two  different  forms  of  the  prior 
on  the  distribution  after  the  change.  The  first  form  of  prior  uses  the 
composite  hypothesis  testing  framework.  As  an  example,  the  Darmois- 
Koopman  family  of  distributions  (Govindarajulu,  1975,  Siegmund,  1985) 
which  is  presented  in  the  sequel,  allows  suitable  parametric  tests,  using  the 
assumption  that  the  statistics  after  the  change  have  a  form  of  a  one  parameter 
exponential  distribution.  The  second  form  of  prior  uses  the  popular  method 
of  multiple  models  whenever  the  set  of  parameters  is  finite.  Such 
methods  can  be  found  in  the  literature  (Anderson  and  Moore,  1979). 

The  problem  of  detecting  the  change  time  and  estimating  the 
statistical  model  after  the  change  is  a  difficult  task  because  of  the  reasons 
given.  Except  for  cases  where  the  solution  to  the  detection-estimation  can  be 
made  explicit,  like  estimation  of  the  jump  amplitude  in  the  case  of  additive 
changes  in  Gaussian  linear  models  (Willsky  and  Jones,  1976),  the  combined 
detection-estimation  solution  cannot  be  shown  in  a  closed  form.  This  point 
is  further  discussed  in  Chapter  III  when  the  generalized  likelihood  ratio 
algorithm  (GLR)  is  applied  to  linear  models. 

This  dissertation  focuses  on  the  methods  of  the  quickest  detection 
problem  which  provides  in  the  case  of  detection  of  jumps  in  the  mean,  a 
convenient  way  to  estimate  the  unknown  jump.  However,  it  will  be  shown 
that  a  lot  of  complicated  problems  like  changes  in  spectral  properties  or 
eigenstructure  (changes  in  State  Space  models,  AR  models  or  ARMA  models) 
can  be  transformed  to  changes  in  the  mean  of  a  statistic  function  g„,  enabling 
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the  use  of  quite  easy  detection  schemes  to  detect  rapid  changes  in  the 
dynamics  of  the  signal  model.  As  shown  in  the  sequel,  such  detection 
algorithms  are  based  on  the  cumsum  procedure  which  provides  a  tradeoff  of 
computation  efficiency  and  complexity. 

3.  Performance  Evaluation 

In  the  off-line  processing,  the  process  is  observed  only  over  a  finite 
interval,  hence  only  a  finite  number  of  samples  is  used.  The  problem  is  then 
considered  as  that  of  classical  hypothesis  testing  (1-1).  In  this  case  the 
performance  is  measured  in  terms  of  probability  of  detection  versus  the 
probability  of  false  alarm. 

In  the  on-line  processing,  the  approach  of  "quickest  detection"  is 
adapted  as  the  performance  criterion  used  in  sequential  analysis.  This 
approach  is  used  by  Nikiforov  (Nikiforov,  1979,  1980).  For  this  setting,  the 
terms  run  length  and  average  run  length  (ARL)  will  be  used  in  order  to 
determine  the  number  of  observations  needed  to  reach  a  detection  decision. 
This  function  will  be  shown  to  be  the  main  tool  in  the  performance 
evaluation  of  the  test  procedures.  The  first  time  the  test  statistic,  i.e.  the 
stopping  rule  (statistic  used  to  determine  the  change)  crosses  the  pre¬ 
determined  threshold  according  to  desired  performance,  is  called  the  stopping 
time  or  sometimes  also  the  Markov  time  (Shiryayev,  1978). 

C  MODEL  BASED  METHODS 

In  designing  the  change  detection/estimation  algorithms,  the  philosophy 
developed  in  Chow  and  Willsky  (Chow  and  Willsky,  1986)  distinguishes  two 
tasks  which  are  depicted  in  Figure  1 .2. 
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The  first  task  is  the  generation  of  change  indication  signals  (residuals) 
sometimes  also  called  error  signals.  These  signals  are  designed  to  reflect  the 
possible  changes  in  the  measurements  or  models  and  to  make  a  subsequent 
detection  possible.  These  signals  are  designed  to  have  a  certain  mean  (usually 
zero)  and  a  white  noise  correlation  signature  when  no  change  occurs.  Tliis  is 
referred  to  as  the  "white  noise"  hypothesis.  In  general  the  mean  value  or 
spectral  properties  change  under  a  disorder. 

The  second  task  is  design  of  the  stopping  rules  (or  decision  rules)  based 
upon  these  residuals. 

Sometimes  an  additional  task  diagnostics  is  added.  This  is  the  problem  of 
estimating  the  origin  of  the  change  (for  example:  which  pole  location 
changed).  A  broad  class  of  change  detection  methods  makes  explicit  use  of  a 
mathematical  model  of  the  observed  system  or  signal.  For  example,  the 
setting  of  the  system  or  signal  in  a  state-space  form  enables  the  use  of  Kalman 
filtering  methods  to  generate  the  residuals  (innovations  in  this  case).  This 
twofold  problem  will  be  presented  next. 


Figure  1.2.  Model  Based  Change  Detection  Scheme  as  a  Twofold  Problem 

(from  Gertler,  1988) 
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1.  Generating  the  Change  Indication  Signals  (Residuals) 

As  shown  in  Figure  1.2,  modeling  is  an  integral  part  of  the  change 
detection  process,  usually  for  creating  "white"  residuals  under  the  "no 
change"  hypothesis.  Using  the  state-space  setting,  residuals  may  be  generated 
in  a  number  of  different  ways,  which  will  be  presented  briefly. 

a.  Straight  Input-Output  Residuals  (Gertler,  1988) 

Given  the  state-space  model 

x(n+l)  =  Ax(n)  +  Bu(«) 

i/(«)  =  Cx(n) 

an  equivalent  input-output  model  can  be  presented  by  using  the  shift 
operator  with  matrices  Gfz)  and  H(z),  z  being  the  shift  operator  and  H  being  a 
diagonal  matrix; 

H(2)  •  y(n)  =  G(z)  •  u  (n) 

where 

G(z)  =  C[adj(l2-A)B] 

H(2)  =  det  (Iz-A)I. 

Defining 

q(M)T  =  lu(M),y(n)F 

F(z)  =  [G(2),-H(z)] 

then  the  input-output  equation  can  be  written  as: 

F(z)  •  q(ti)  =  0. 
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A 

Consider  now  the  model  matrix  F(2)  which  represents  the  discrepancies 
between  the  input-output  models  6(2)  and  6(2),  and  the  true  system  G(2)  and 
H(2): 


F(2)  =  [G(2),-H(2)] 

where 

biz)  =  G(2)  +  AG(z,t);  H(2>  =  H(2)  +  dH(2,f). 

Such  discrepancies  may  account  for  plant  faults  or  changes.  Applying  this 

A 

equation  to  the  measurements  q(n)  with  the  model  matrix  F(2)  yields  the 
residuals  vector  e(n); 

F(2)-  =  c(«). 

b.  Filtering  and  Parameter  Identification  Methods 

A  popular  solution  (Willsky,  1976)  consists  of  monitoring  the 
innovations  or  the  prediction  errors,  using  estimation  filters  or  parameter 
identification  methods.  Using  the  optimal  state  estimator,  the  Kalman  filter 
is  designed  according  to  the  "normal  mode"  or  no  change  situation.  If  prior 
knowledge  is  known  about  the  change  or  if  a  diagnosis  is  required  in  addition 
to  detection,  a  possible  solution  consists  of  using  a  bank  of  Kalman  filters 
designed  according  to  all  the  possible  models  for  each  hypothesis  (see  Figure 
1.3).  Notice,  that  the  Kalman  filter  produces  under  the  null  hypothesis  zero 
mean  and  independent  residuals.  Consequently  deviations  from  this 
behavior  are  indicators  of  change.  However,  in  some  practical  problems,  it 
may  be  necessary  to  monitor  a  function  of  the  innovations  rather  than  the 
innovations  themselves  (Basseville,  1988). 
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Figure  1.3.  Filtering  Methods  for  Generating  the  Residuals. 

(a)  "normal  mode"  filter 

(b)  generating  error  signatures  due  to  possible  change  hypotheses 


In  identification-based  methods,  a  residual  quantity  is  defined  in 
relation  to  the  plant  parameters.  The  plant  is  identified  in  a  fault-free 
reference  situation,  then  repeatedly  on  line.  The  results  of  the  latter,  are 
compared  to  the  reference  values  and  a  parameter  error  (residual)  is  formed. 
Remark;  these  model-based  methods  do  not  include  explicit  model  switching. 
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In  Chapter  VI  such  methods  will  be  described,  thus  enabling  us  to  modify  the 
Kalman  filter  to  detect  the  change. 

c.  Redundancy  Methods 

These  techniques  are  used  primarily  for  failure  detection  (sensor 
failures).  Two  classes  can  be  distinguished.  The //rst  class  is  direct  or  physical 
redundancy.  Using  several  identical  sensors  measuring  the  same  quantities, 
the  differences  between  each  possible  pair  may  reflect  a  change.  These 
residuals  are  processed  using  voting  methods  (Willsky,  1976).  Another 
approach  consists  of  searching  subsets  of  measurements  for  inconsistency, 
thus  indicating  changes. 

The  second  class  is  indirect  or  analytical  redundancy.  This 
method  monitors  of  all  the  existing  relationships  between  the  inputs  and  the 
outputs  that  are  zero  under  the  hypothesis  of  no  change  exists.  These 
techniques  were  studied  by  Deckert  (Deckert  et  al.  1977),  Chow  and  Willsky 
(Chow  and  Willsky  1980, 1984),  and  others. 

2.  Statistical  Testing  (stopping  rules) 

The  resulting  residual  vector  contains  the  combined  effects  of  the 
changes  and  the  noise  (as  well  as  the  modeling  errors).  Two  approaches  can 
be  considered. 

The  first  consists  of  the  deterministic  modeling  of  the  changes.  (It  is 
important  not  to  confuse  the  random  nature  of  the  change  time  and  (usually) 
the  change  magnitude  with  the  deterministic  modeling  of  the  change).  For 
example,  consider  the  case: 

~  ~  N(0,<T^) 
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where 


do  1  <  n  <  v-1 
&i  n  ^  V 


V  being  the  change  time  (random).  Therefore,  the  effect  of  changes  on  the 
residuals  has  to  be  separated  from  that  of  the  noise.  This  is  done  by  statistical 
testing,  making  use  of  the  assumption  of  the  the  non-changing  statistical 
s^^ructure  of  the  noise,  versus  the  changing  statistical  nature  of  the 
observations  (change  in  mean,  variance,  etc.). 

In  the  second  approach  the  observed  changes  in  the  time  series  are 
modeled  in  a  statistical  manner.  Therefore,  the  noise  is  part  of  the  modeling. 
Hence,  the  statistical  nature  of  the  changes  can  be  modeled  as  changes  in  the 
noise  characteristics. 

Several  testing  methods  will  be  described  briefly,  while  the  main  part 
of  the  dissertation  will  focus  on  a  subset  of  them. 

CL  Compound  Scalar  Testing  (z  ^-type  off-line  test) 

Consider  a  single  scalar  test  statistic 

T  -1 

(n) ■  ■  e(n)^  A. 

Ho 

where  e(n)  is  the  residual  vector  and  S*.  is  the  covariance  matrix  of  the  vector 
e.  Then,  under  the  no  change  hypothesis,  the  residual  vector  e(n)  consists  of 
normal  i.i.d.  components.  Hence,  the  threshold  A  follows  a  chi-square 
distribution  with  p  degrees  of  freedom  (p  being  the  vector  size  or  number  of 
residuals).  Recursive  chi-square  tests  are  also  available. 


b.  Likelihood-oriented  Methods 


The  likelihood  ratio  approach  is  a  general  tool  for  change 
detection.  Different  methods  can  be  considered  (Ghosh,  1970;  Willsky,  1980). 
For  example,  consider  a  test  which  compares  the  hypothesis  Hi  of  nonzero 
residual  mean  to  the  null  hypothesis  Ho  of  zero  mean.  The  decision  is  based 
on  the  likelihood  ratio  between  the  joint  distributions  of  the  residuals 


P{g(l),e(2),...,Kn)tHi} 
^  °®P{e(l),e{2) . e(n)lHo} 


(1-7) 


The  numerator  and  the  denominator,  respectively,  are  the  Probability 
densities  of  the  observed  time  series  under  the  two  hypotheses.  If  the 
residuals  are  independent,  then  (1-7)  is  easy  to  compute.  Under  the 
hypotheses  testing  given  by  (1-2): 


Sj'(e)  =  logfl 


Pi('i) 


P|te) 


If  the  residuals  monitored  are  the  innovations  of  a  Kalman  filter,  then  it  can 
be  shown  (Anderson  and  Moore,  1979)  that  the  distribution  of  these 
innovations  is  given  by  the  conditional  distribution  of  the  observations  x, 
(conditioned  by  their  past  values),  hence,  (1-7)  can  be  written  in  the  general 
form 


Si 


r=l 


(1-8) 


This  kind  of  test  is  called  cumulative  sum  test  (or  cumsum  test)  and  can  be 
written  as 
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i=k 


(1-9) 


where 


^(^i)  =  log 


Pi(y,|x,-i,...,3Co) 
^o(^i|^i-l  '•  "'^o) 


Notice  that  in  this  case  the  computation  of  Sj  is  recursive. 

The  tests  based  on  (1-8),  (1-9)  are  stopping  rules  (i.e.,  tests  which 
enable  us  to  estimate  the  change  time  v),  based  upon  the  knowledge  of  the 
parameterized  densities  before  and  after  the  change.  In  this  case  the  estimated 
stopping  time  can  be  found  by  using  the  maximum  likelihood  estimate  (MI  ■ 
under  Hi,  namely 


V  =  argmaxS”. 

l<v<n 


(1-10) 


In  general,  the  statistical  properties  atter  the  change  (i.e.,  using  the 
parameterized  format  of  Pi  as  t^i)  are  not  known.  Hence,  the  cumsum  test 
(1-9)  can  be  used  to  reach  the  change  decision 


max  maxS”(9o,0i)^  A. 
l<v<n  01  Ho 


(1-11) 


This  test  is  called  the  generalized  likelihood  ratio  (GLR)  test  (Willsky  and 
Jones,  1976)  and  involves  a  double  maximization  of  high  computation  cost. 
Only  in  special  cases  like  additive  changes  in  linear  systems  modeled  in  the 
state-space  representation,  it  can  be  shown  (Willsky  and  Jones,  1976)  that  the 
effect  of  the  resulting  changes  in  the  innovation  vector  e„  are  also  additive. 
Therefore,  in  the  case  of  Gaussian  state  and  observation  noises,  there  are  cases 
for  which  explicit  solutions  for  6]  exist.  For  example,  if  6i  represents  the 
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mean  after  the  change.  Then,  the  maximization  over  6]  is  explicit  (Basseville, 
ly88),  resulting  in  joint  estimation  of  the  vector  (v,  di)  by  recursive 
computation  of  s”  and  i'n- 

The  theoretical  optimality  of  the  GLR  has  been  investigated 
recently  (Deshayes  and  Picard,  i986)  from  the  off-line  point  of  view.  They 
show  that  under  asymptotic  exponential  decay  rater  of  the  error  probabilities 
a,j5  (where  a  is  the  Type  1  error  probability  or  the  false  alarm  probability  and 
similarly  p  is  the  Type  2  probability  or  the  probability  of  detection)  and  for 
specific  families  of  distributions,  the  GLR  tests  are  UMP. 

Remarks 

•  The  stopping  rule  based  upon  a  cumsum  Statistic  can  use  any  general 
nonlinearity  g( ).  For  example,  instead  of  the  probability  ratio  of 
conditioned  observations  as  in  (1-9),  a  probability  ratio  of  the 
observations  x,  can  be  used.  In  this  case 


S(^i)  =  loS 


PoM' 


•  Both  off-line  and  on-line  implementations  (using  "sliding"  windows) 
can  be  used.  Examples  for  using  this  'Method  for  ARMA  and  AR 
models  can  be  found  in  the  literature  (Segen  and  Sanderson,  1980, 
Basseville,  1986,  and  Basseville  and  Benveniste,  1983). 


c.  The  Statistical  Local  Approach 

This  approach  is  used  in  order  to  overcome  the  main  drawback  of 
the  GLR  test,  namely  its  computation  cost  due  to  the  double  maximization. 
This  approach  was  introduced  by  Nikiforov  (Nikiforov,  1986)  for  on-line 
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detection  of  changes  in  spectral  characteristics  of  ARMA  models.  The  original 
idea  consists  of  looking  for  small  changes  in  models  and  using  a  special  type 
of  Taylor's  expansion  of  the  log-likelihood  function.  Thus,  the  nonlinearity 
g  ( )  becomes 

Deshayes  and  Picard  (Deshayes  and  Picard,  1986),  showed  that  for  the  statistic 
gixn  '  there  exists  a  central  'iTrit  theorem.  Any  change  in  6  is  reflected  as  a 
change  in  g(Xn),  for  which  slopping  rules  based  on  cumsum  tests  can  be 
designed. 

d.  Bayesian  Oriented  Methods 

Bayesian  oriented  methods  are  based  upon  some  prio  statistical 
knowledge  on  the  change  time,  or  uses  some  knov/ledge  on  tb  .■  ^witching 
model  used  to  describe  the  statistical  behavior  of  changes.  The  use  of 
hidden  Markcv  models  to  describe  the  changes  in  state-space  models 
(Shumway,  1990)  is  very  popular,  and  leads  to  some  change  detect'on 
algorithms.  However,  in  the  Bayesian  framework,  it  is  very  difficult  to  find  a 
general  solution  because  of  the  use  of  different  cost  functions  or  different 
prior  assumptions.  As  mentioned  in  Section  B.2  of  this  chapter,  Shiryayev 
(Shiryayev,  1977)  introduced  a  Bayesian  competitor  as  an  alternative  to  the 
Page  cumsum  test.  Recently,  Poliak  (Poliak,  1985),  proved  an  optimality 
property  for  the  Shiryayev-Roberts  rule. 


(1-12) 
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e.  Heuristics  Associated  with  a  Two-model  Approach  (Basseville, 
1986) 

This  method  called  the  "two  models  approach"  is  in  fact  a 
simplification  of  the  GLR  test.  Implementation  of  GLR  tests  leads  to 
"boundary"  problems,  because  models  are  not  very  reliable  when  identified 
on  short  segments.  In  order  to  overcome  this  problem,  the  two  model 
approach  was  introduced.  These  algorithms  are  less  efficient  than  likelihood 
ratio  methods  but  more  efficient  than  the  tests  based  upon  the  local  approach. 

D.  ORGANIZATION  OF  THE  DISSERTATION 

This  dissertation  focuses  on  the  on-line  analysis  of  detection  algorithms, 
hence,  the  quickest  detection  methods  are  explored.  Both  the  non-Bayesian 
and  Bayesian  points  of  view  are  investigated  but  the  focus  is  on  non-Bayesian 
(maximum  likelihood)  methods.  In  this  context,  sequential  analysis  and  a 
certain  type  of  cumulative  sum  procedures  which  form  a  generalization  of  a 
test  first  studied  by  Page  (Page,  1954)  to  detect  a  change  in  the  distribution  of 
random  variables  observed  at  random  times  are  investigated.  The  Bayesian 
point  of  view  is  also  included.  Shiryayev  (Shiryayev,  1978)  results  are  shown 
to  play  a  key  role  in  any  Bayesian  approximation. 

Different  disorder  types  (Type,  2,  3,  and  4)  are  investigated  throughout  the 
dissertation  in  the  sequential  (on-line)  detection  framework. 

The  body  of  this  dissertation  is  divided  into  four  groups  as  shown  in 
Figure  1.4.  Chapters  n  and  HI  form  the  maximum  likelihood  solution  of  the 
detection  problem  while  Chapter  V  presents  the  Bayesian  approach. 
Chapter  IV  provides  additional  tools  to  analyze  the  performance  of  both  the 
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non-Bayesian  and  Bayesian  methcxis  by  using  diffusion  type  approximations. 
Finally,  Chapter  VI  presents  a  MAP  estimator  to  a  Type  4  problem,  namely, 
discontinuity  type  disorder. 


1  Type  I,  n,  and  HI  Disorder  Problems  1  , 

Type  rV  1 

1  1 

1 

Maximum  Likelihood 
Methods 

Diffusion 

Approximations 

1  1  , 

Bayesian  Methods 

1  1 

1 .  1 
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I  ^  1 
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Conditions 

Performance  and 

Evaluation  Non-parametric 
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i  1  . 

CH  VII 

Conclusions 

Procedure 

Figure  1.4.  Sequential  Methods  for  Quickest  Disorder  Detection 

Each  chapter  includes  an  introduction  and  a  summary  section  which  will 
assist  in  relating  all  the  topics  presented  throughout  this  dissertation.  An 
appendix  which  summarizes  the  basic  concepts  of  hypothesis  testing  and 
detection  theory  is  also  provided. 
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11.  SEQUENTIAL  METHODS  FOR  QUICKEST  DETECTION  OF  CHANGES  IN 
PROBABILITY:  THE  NON-BAYESIAN  FRAMEWORK 

A.  INTRODUCTION 

Consider  the  observation  process  {Ar„)  with  probability  density  P^Xn)  or 
conditional  probability  density  Pe(x„  I  x„_i, *0)  depending  upon  an 
unknown  parameter  6.  This  parameter  can  describe  two  different  situations: 
In  the  first  situation,  6  can  be  for  example,  the  mean  or  variance  of  the 
density  of  the  time  series,  and  will  reflect  directly  the  statistical  properties  of 
the  time  series.  In  the  second  situation,  using  some  convenient 
parameterization  of  a  system  or  signal  denoted  by  6,  i.e.  the  state-space 
representation  or  ARMA  modeling,  6  describes  the  dynamics  of  a  system  or 
signal. 

In  the  context  of  detecting  jumps  (sudden  changes)  in  the  parameter  set  6, 
we  are  interested  in  detecting  changes  in  the  dynamics,  or  in  the  statistical 
properties  of  complicated  structures. 

Since  the  jump  time  is  unknown,  the  problem  is  twofold:  detection  of 
the  change,  and  estimation  of  the  change  time.  In  this  chapter  we  will  focus 
on  the  detection  problem  only.  As  shown  in  the  last  chapter  there  are 
different  issues  that  must  be  addressed;  on-line  versus  off-line 
implementation,  parametric  versus  non-parametric  methods,  etcetera.  These 
issues  were  briefly  presented  in  Chapter  I,  and  are  investigated  in  more  detail 
in  the  context  of  change  detection  in  this  chapter.  In  particular  we  will  be 
addressing  the  following  points; 


26 


1.  Off-line  versus  On-line  Viewpoints 

Even  though  the  final  goal  is  to  implement  on-line  (sequential) 
procedures,  the  off-line  viewpoint  is  significant,  since  it  can  be  used  to  derive 
on-line  tests.  This  point  will  be  clarified  in  this  chapter.  These  two 
viewpoints  differ  in:  (a)  problem  formulation  and  (b)  performance  evaluation 
as  related  to  different  criteria. 

In  the  off-line  formulation,  the  change  detection  problem  is 
implemented  as  multiple  hypotheses  testing,  for  which  the  Neyman-Pearson 
lemma  is  not  valid  so  that  no  UMP  tests  exists.  Thus,  the  criterion  from  this 
viewpoint  is  that  of  classical  detection  problems,  namely:  size  and  power  of 
the  test. 

In  the  on-line  formulation,  the  criteria  is  modified  to  detect  a  change 
in  the  parameter  6  as  quickly  as  possible.  In  the  on-line  point  of  view  the 
detection  is  performed  by  a  stopping  rule  of  the  general  form 

N  =  inf{«:  Sn  ^  A} 

Sn  being  an  appropriate  test  statistic  (see  Chapter  I). 

The  performance  of  a  stopping  rule  is  evaluated  by  T  the  mean  time 
between  false  alarms  (1-  6),  and  by  D  the  delay  for  detection  (1-  5)  as  proposed 
by  Lorden  (Lorden,  1971).  This  is  a  "minimax"  type  of  average  delay  referred 
to  as  the  best  least  favorable  change  time. 

The  difference  between  the  off-line  and  the  on-line  viewpoints  is 
significant:  whereas  no  optimal  test  does  exist  in  the  off-line  framework, 
optimal  stopping  rules  do  exist  in  the  on-line  framework  for  independent 
identically  distributed  (i.i.d.)  sequences  with  known  distributions  before  and 
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after  the  change.  Moustakides  (Moustakides,  1986)  extended  this  result  to  the 
non  i.i.d.  case.  Since  in  this  chapter  we  take  the  non-Bayesian  approach, 
another  difference  is  viewed:  In  the  off-line  processing  we  assume  a  uniform 
prior  distribution  over  all  the  observation  set,  resulting  in  a  likelihood 
detector  which  computes  the  likelihood  for  all  possible  disorder  times, 
whereas  in  the  on-line  approach,  the  disorder  time  is  assumed  to  be  an 
unknown  parameter.  Lorden  showed  (Lorden,  1971)  that  a  certain  class  of 
stopping  rules  called  cumulative  sum  tests  (cumsum)  are  optimal  in  the 
sense  of  his  criteria.  The  cumsum  tests  form  a  rich  enough  family  of  tests, 
and  are  the  focus  of  investigation  of  this  chapter.  In  particular,  the  test  called 
the  Page-Hinkley  stopping  rule  is  investigated  in  depth. 

2.  Composite  Testing 

As  mentioned  earlier,  optimal  stopping  rules  do  exist  in  the  case  of 
i.i.d.  sequences  with  knovm  distributions  before  and  after  the  change.  When 
the  distribution  eifter  the  change  is  not  known,  a  composite  framework  needs 
to  be  used.  This  issue  is  addressed  by  using  the  Darmois-Koopman 
Distribution  for  a  one  parameter  exponential  family. 

3.  Parametric  versus  Non-parametric  Methods 

The  nonlinearities  or  transformations  g(  )  used  for  the  cumsum 
detection  procedures  (1-9)  can  have  a  parametric  or  non-parametric  form 
(sign,  rank  tests,  etc.).  The  analyses  will  provide  a  general  framework  which 
can  be  used  for  either  type. 
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B.  ORGANIZATION  OF  THIS  CHAPTER 

The  main  goal  of  this  chapter  is  the  analysis  of  sequential  methods  for 
change  detection,  namely,  the  cumsum  procedures  and  in  particular  the  Page- 
Hinkley  stopping  rule.  The  purpose  is  to  set  a  general  framework  in  which 
the  transformation  (nonlinearity)  used  can  be  of  a  general  form  (different 
parametric  and  non-parametric  forms).  Thus,  the  following  two  sections  (C 
and  D)  can  be  considered  as  a  "guided  tour"  through  theorems  and  results 
needed  to  understand  and  analyze  cumsum  procedures  and  their 
performance  (presented  in  Sections  E  and  F). 

In  Section  C,  sequential  tests  known  as  one-sided  and  two-sided  Wald 
tests  are  presented  in  the  classical  detection  formulation.  Some  basic 
theorems  (Wald  identity)  which  are  shown  to  be  important  for  the  general 
disorder  or  change  detection  are  presented. 

In  Section  D,  the  sequential  tests  implemented  with  the  log-likelihood 
function  known  as  the  Sequential  Probability  Ratio  Tests  (SPRT)  are 
presented.  Optimal  properties  of  these  tests  are  shown.  Performance 
evaluation  of  the  one  and  two  sided  SPRT,  known  as  Wald  approximation 
are  analyzed.  Within  this  framework,  composite  SPRTs  using  the  Koopman- 
Darmois  family  of  distributions  are  presented.  Basic  performance  measures 
in  the  presence  of  strong  and  weak  changes  are  shown. 

In  Section  E,  we  introduce  the  cumsum  stopping  rules  in  the  on-line 
framework  (using  Lorden's  criterion).  The  Page  test  is  presented  and  shown 
to  be  as  a  repeated  one-sided  Wald's  test.  Both  the  off-  and  on-line 
viewpoints  are  presented.  Observing  the  renewal  property  of  cumsum  tests, 
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using  Ladder  variables  and  results  from  queueing  theory,  new  aspects  of 
cumsum  tests  are  addressed.  The  Page  test  is  also  shown  to  be  a  maximum 
likelihood  detector.  Finally,  optimal  properties  of  the  Page  tests  are  presented; 
this  test  is  shown  to  be  optimal  in  the  sense  of  Lorden  criteria. 

Section  F  presents  the  performance  evaluation  of  Page's  test.  The  run 
length  function  is  shown  to  be  the  primary  tool  needed  for  the  analysis  of 
delay  and  average  false  alarm  rate  of  the  test.  Using  the  results  in  Sections  C 
and  D  we  derive  two  results  known  as  Lorden's  and  Wald's  approximations. 
Finally,  the  asymptotic  performance  framework  is  introduced  and  used  for 
two  important  results;  first,  the  asymptotic  approximation  of  the  run  length 
function,  and  second,  the  generalization  of  Lorden's  results  to  general 
nonlinearities,  other  than  the  log-likelihood  transformation  used  in  the  Page 
stopping  rule.  A  general  framework  of  asymptotic  performance  evaluation  of 
Page's  test  is  provided.  The  resulting  measure  is  shown  to  be  used  for  any 
nonlinearity  in  the  presence  of  various  noise  distributions. 

Section  G  presents  a  short  summary  of  the  m.ain  results  of  this  chapter. 

C  SEQUENTIAL  TESTS 

An  alternative  approach  to  the  fixed  size  tests  is  to  fix  the  desired 
performance  and  allow  the  number  of  measurements  to  vary  in  order  to 
achieve  this  performance. 

To  formulate  the  problem,  suppose  that  the  observations  k  =  1,2,  ...} 
are  i.i.d.  and  distributed  according  to 

Ho:  Xk~Po,  k=\,2,... 

versus  (2-1) 


30 


Hi:  Xjt-Pi,  k  =  \,2,... 


where  Pq  and  Pi  are  two  possible  distributions.  A  sequential  test  is  defined  by 
the  pair  of  indicator  sequences  (0/f)  where: 

(P  =  {(p^:  k  =  0,l,  2, ...}  is  the  stopping  rule  indicator,  i<p:  ->  {0,1}), 

d  =  (djfc:  fc  =  0, 1,  2, ...)  is  called  the  terminal  decision  rule. 

For  an  observation  sequence  (rjt:  k  =  0,1,  2, ...}  the  rule  {<p4)  makes  a  decision 
d(xi,  X2,  JTn)  whether  or  not  any  change  occurred.  In  particular,  sequential 
tests  can  be  described  as  follows:  Continue  sampling  as  long  as 
<p(x-[,  X2,  •••/  Xn)  =  0,  and  stop  when  (p(xi,  X2, ...,  xn)  =  1.  We  define  two  kinds  of 
tests:  two-sided  and  one-sided. 

The  two-sided  sequential  test  is  based  on  the  definition  of  the  cumulative 
sum: 


So  =s 


where  g:  51  SR  is  a  memoryless  function  of  the  observations,  and  s  is  called 
the  initial  score. 

We  detect  a  change  according  to  the  following  stopping  rule: 


<Pn{x-i,X2,...Xn)=- 


[O  if  S„e(a,b) 
|l  ifS„e(fl,b) 


continue 

stop 


(2-3) 


where  a,  b  are  the  stopping  thresholds;  b  <0  <  a. 
Also,  the  terminal  decision  indicator  is  given  by: 
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fO  if  S„  <  b  no  disorder 
dn{xi,X2,...Xn)  =  ^^  if  S„  >  fl  disorder. 

The  stopping  time  N  (sometimes  called  the  sample  size  or  the  run  length 
of  the  test)  is  defined  as: 


N  =  inf[«;  S„  e 

and  the  exit  times  are  defined  by: 


Na  =  inf{«:  S„  >  a]  (2-4) 

Ni,  =  inf[n:  S„<b] 


The  error  probabilities  for  the  two-sided  tests  are  defined  as: 

a  =  Pr{SN  2  a|H  =  Hq} 

P  =  Pr{SNSI>|H  =  H,}. 


In  classical  detection  theory,  a  is  defined  as  the  probability  of  false  alarm,  and 
P  is  the  probability  of  miss.  In  terms  of  hypothesis  testing  the  acceptance  zone 
Wa  is  defined  as  the  zone  where  Xk  e  Wg  or  Xk  ~  Fo-  The  rejection  zone  Wr  is 
defined  by  e  Wr  or  Xk  -  P\  (disorder  zone).  The  indifference  zone  Wi  is 
defined  by  X)t  €  S-Wg-Wr  (See  Figure  2.1). 

The  one-sided  test  is  defined  by  letting:  b  ->  -<» 


and 


<t>n{xvX2'---Xn)  =  L 


if  S„  <a  continue 
if  S„  >a  stop 


d„{x-^,X2,...Xn)  =  L 


if  Sf^<a 
if  S„  >  fl 


disorder 


(2-5) 
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and  the  stopping  time  is  given  by 


N  =  inf{«:S„  >  a} 

=  00  S„  <  a. 

The  error  probabilities  are  defined  in  this  case  as: 

a  =  Pr{SN  i  fl|Ho} 

)3  =  Pr{SN<a|H,}. 


Figure  2.1.  Two-sided  Sequential  Test 


1.  The  Fundamental  Identity  (Wald's  Identity)  of  the  Sequential 
Analysis 

This  identity  forms  the  basis  of  subsequent  analysis  for  the  Operating 
Characteristics  (OC)  and  ARL  functions  of  a  Sequential  Test  (ST).  It  gives  a 
convenient  way  to  derive  the  moment  of  the  sample  size  required  to 
terminate  the  ST. 
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Theorem  (Wald,  1947): 

Let  x\,  X2,  •••,  be  independent  random  variables  and  let 
Sfi  =  ST(fl,  b,  S„)  be  any  sequential  test  of  Hq:  6=  6o  against  H\: 

9=^  &i  based  on  i.i.d.  {^(jfn)},  and  let  N  be  the  stopping  time  for  this  test. 

Let  denote  the  moment  generating  function  of  the  random 

variable  g(x)  under  the  hypothesis  H,: 

¥i  {h)  =  E{exp(g(ac)  ■  }  i  =  0, 1 

for  every  real  h  for  which  v^,(/i)  is  bounded.  Then,  if  P{g(x)  =  0  !h,)  <  1  and 
1  ^i)  =  1/  we  have: 

£{exp(SN)!)[v'i(C''iH(}  =  ’  '  =  0'1  (2-6) 

The  proof  can  be  found  in  Ghosh  (Ghosh,  1970,  p.  208)  or  Feller 
(Feller,  1971,  p.  603). 


2.  Applications  of  Wald's  Identity 

As  a  direct  corollary  to  Wald's  identity,  immediate  results  for  the 
ARL  function  can  be  obtained: 

define  z  =  g(x). 


then,  the  average  run  length  (ARL)  is  given  by  Govindarajulu 
(Govindarajulu,  1975): 


E{N|fl,}  = 


£{SN|6il 

£{z|fl,} 

E{5^|g,} 


if  E|z|9,  J  =  V^,(0)  *  0 
if  E{z|9,}=  V','(0)  =  0 


(2-7) 
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Bounds  on  the  stopping  thresholds  can  be  associated  with  the  b,  g) 


a  <  log 


Izl 

a 


(2-8) 


h  t  log-^. 

1-a 

The  strict  equalities  hold  if  and  only  if  b  <0  <  a,  and  in  terms  of  the  error 
probabilities  for  \a,b\  >  0: 


a  <  {1-^)6  " 


(2-9) 


These  approximation  are  known  as  Wald  approximations  and  were  derived 
by  ignoring  the  "excess  over  the  boundaries"  (Siegmund,  1985).  Notice  that 
we  can  get  yet  cruder  inequalities  when  we  consider  the  asymptotic  case 
where  a  i  0,  -i  0.  Then: 

a<e~“  ^<e  . 


3.  Comparison  of  Sequential  Tests  (ST)  and  Fixed  Sample  Size  Tests 
(FSST) 


Our  object  is  to  investigate  the  number  of  samples  saved  by  an 
ST(a,b,g)  over  the  corresponding  optimum  FSST,  both  designed  to  achieve 
the  same  performance  {a,p). 

The  relative  efficiency  of  ST(a,fc,g)  at  d  is  defined  (Ghosh,  1970)  by 


RE{d)  = 


E{N\e} 
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where  nia,P)  is  the  sample  size  required  by  FSST  test  and  E{N16)  is  the  ARL 
function  of  the  ST  test,  both  designed  to  achieve  the  same  performance  (a,^). 
It  can  be  shown  (Poor,  1988)  that  for  the  case  of  a  simple  sequential  detection 
of  a  constant  signal  in  the  presence  of  white  Gaussian  noise,  using  a 
likelihood  ratio  detector  for  both  the  ST  and  the  FSST,  the  limiting  RE  is 
given  by 

lim  RE  =  4. 
a=p-*0 

Thus  ^oi  vanishingly  small  error  probabilities  (with  a  =  fi)  the  SPRT  requires 
on  the  average  only  one-fourth  as  many  samples  as  does  the  FSST  test. 
Further  discussion  can  found  in  Ghosh  (Ghosh,  1970). 


D.  SPRT  TESTS 


When  the  test  procedure  given  by  (2-2  and  2-3)  uses  the  the  log-likelihood 
ratio  as  the  nonlinearity  f  ^jc) 


g(x) = log 


dP{x\6]) 

d.'’(xieo) 


the  sequential  test  is  called  the  sequential  probability  ratio  test  tSPRT).  The 
relation  between  any  ST(a,b,g)  to  SPRT  {A,B)  is  given  by 

A  =  e‘'  B  =  e^ 


where 


b  ^  0  <  a  and  0  <  B  <  1  <  /I. 

The  bound  approximations  (2-8),  (2-9)  can  be  converted  to  SPRT  test  terms  by 
placing  =  A  and  e*’  =  B. 
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The  SFRT  test  has  a  fundamental  property  which  is  extremely  important 
and  will  be  used  in  the  sequel  (Therrien,  1989): 

under  disorder:  E{g(Xi)  I  ©i)  >  0 

,  .  (2-10) 
under  no  disorder:  E{g(xi)  1  6b)  ^  0. 

In  the  following  sections,  several  properties  of  SPRT  test  will  be 
presented,  and  the  two-  and  one-sided  ?PRT  tests  will  be  analyzed,  followed 
by  ^he  composite  hypothesis  framework  for  SPRT. 

1.  Optimal  Properties  of  SPRT 

For  testing  a  simple  hypothesis  against  a  simple  alternative  with  i.i.d. 
observations,  the  SPRT  test  is  optimal  among  all  sequential  and  fixed  sample 
size  tests  in  the  sense  of  minimizing  the  expected  run  lengin  both  under  Hq 
and  under  H\  among  all  the  tests  having  no  large  error  probabilities.  The 
fol’owing  theorem  establishes  this  result. 

The  Wald-Wolfowitz  Theorem  (1948): 

Among  all  tests  (FSST  and  ST)  for  which  Pr{accept  Hi  |  Hq)  <  a,  and 
Pr(accept  Hq  1  H]}  <  /3  and  for  which  E{N  1  9i}  <  «>  i  =  0,1;  the  SPRT  with  error 
probabilities  a  and  minimizes  both  E{N  |  ^}  and  E{N  |  9i}.  □ 

The  proof  can  be  found  in  (Ghosh,  1970).  The  optimal  property  of  the 
SPRT  test  can  be  viewed  as  analogous  to  the  Neyman-Pearson  lemma. 

Definition  (Wijsman,  i960): 

A  SPRT  is  said  to  have  a  monotonicity  property  if  when  the  upper 
stopping  bound  of  the  SPRT  is  increased  and  the  lower  bound  is  decreased, 
then  at  least  one  of  the  error  probabilities  decreases. 
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Theorem  (Lehmann,  1959) 

Let  X\,  X2,  ...  be  independent  random  variables  having  probability 
density  ?{x,Q)  which  has  monotone  likelihood  ratio.  Then  any  SPRT  for 
testing  Hq:  6  =  do  against  Hi:  ft  =  dj  (do  ^  di)  has  a  nondecreasing  power 
function.  g 

The  proof  can  be  found  in  Lehmann  (Lehmann,  1986). 

2.  The  Termination  Property  of  SPRT 

The  SPRT  test  is  a  closed  test  if  and  ordy  if,  the  termination  property 
holds  for  every  6  €  When  g(xi)  are  i.i.d.  any  SPRT  is  closed  under  the 
following  mild  restriction  (Poor,  1988): 

suppose  that  for  any  de0,g(xi)  sre  i.i.d.  random  variables  and 
P(g(xi)  I  6)  <  1,  then: 

•  Urn  Pr{n  >  N\d}  =  0. 

n->oo 

•  there  exists  a  to>0  such  that  the  moment  generating  function  |  d) 
exists  for  all  real  t<  to- 

This  means  that  the  entire  statistics  of  n  can  be  found.  The  result  is 
that  the  SPRT  or  the  ST([^/(l-a)],[(l-^/a]))  based  on  Wald's  approximations 
are  always  closed.  Ghosh  (Ghosh,  1970)  extended  the  result  to  the  situation 
g(xi)  are  noi  i.i.d. 

Another  optimal  property  of  the  SPRT  was  shown  by  Wald  which 
established  a  lower  bound  on  the  ARL  of  competitors  of  the  SPRT.  This 
result  will  be  presented  in  the  sequel  (2-24)  when  presenting  the  problem  of 
composite  hypothesis  testing. 
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3.  The  Operational  Characteristics  (OC)  and  AKL  Functions  of  Two- 
sided  SPRT  Tests 

The  use  of  Wald's  identity  (2-6)  forms  the  basis  of  certain  bounds  for 
the  OC  function  Q(6)  and  ARL  functions  for  the  SPRT. 

Wald's  Approximations 

Wald's  approximations  are  based  on  the  use  of  the  moment 
generating  function  of  g(x)  (2-6)  provided  that  we  can  find  two  nonzero  real 
numbers  ho  and  hi  such  that 

V'i(^)  =  £{exp(/ig(x))|6j}  =  l  1  =  0,1-  (2-11) 

Existence  and  uniqueness  of  such  roots  are  guaranteed  when  ^(x)  has  a 
nonzero  mean  and  satisfies  certain  other  conditions  (Feller,  1971). 

The  key  results  for  our  purposes  is  that  \j/i(li)  =  1  has: 

•  one  and  only  one  nonzero  root 

-oo</i(6>)<0  if  E{g(x)l6}  =  Ee{g(x)}  >  0 

(2-12) 

0  <  h(0)  <  oo  if  E{g(x)!6}  =  E0{g(x)}  <  0. 

•  No  non  zero  real  root  if  E{g(x)  (  6}  =  0. 

When  we  try  to  detect  a  change  from  a  negative  trend  E{g(x)  I  60)  <  0  to  a 
positive  trend  E{g(x)  I  0i)  >  0  then,  it  implies  that  h(6i)  <  0  <  h(6o)- 

Notice  that  the  roots  are  functions  of  two  parameters:  the  probability 
density  of  the  observations  P(x)  and  the  nonlinearity  g(x).  The  approximate 
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formulas  of  SPRT  test  STia,b)  for  the  OC  and  ARL  are  derived  when  gixO  are 
i.i.d.  and  b  <0  <a,  using  the  assumption  of  no  excess  of  S„  over  a  and  b. 

Lower  bounds  for  the  ARL  function  of  two-sided  SPRT  test  under 
hypotheses  Hq  and  Hi  in  terms  of  the  error  probabilities  are  given  by  Wald 
(Wald,  1947): 


L(eo)  =  £i{N}2 


(1  -  g)log(/i  / 1  -  g)  +  alog(l  -^  /  n) 


L(ei)  =  £,{N}  >  ^  7 «) 


if  0  =  Bq. 

(2-13) 

if  e  =  01- 


Bounds  for  the  operational  characteristic  function  Q(6)  are  given  by 
Ghosh  (Ghosh,  1970): 

•  For  detecting  a  change  of  positive  trend  (upward  change)  h(9o)  >  0: 

exp{/i(6>)-fl]-l  ^  ^  g(O)exp{/i(0)a}-l 

exp|/j(0)fl}  -  7;(0)exp|/i(0)b}  5(9)exp{/i(0)-a}-exp|/i(6)bj 

•  For  detecting  a  change  of  negative  trend  (downward  change)  h(0o)  <  0: 

exp{/7(0)fl}-l  ^Q((7)^  Vie)exp{h(e)a}-1 

exp{/j(0)  ■  flj  -  5(0)exp{/i(0)  •  bj  Tj(0)exp|i7(0)  •  aj  -  exp|/i(0)  •  fcj 


where 

O(^)  =  Jnf^^E|exp{/j(0)  g(x)}exp{/i(0)  g(x)}  <  j;o|  <  1 
^E|exp{h(0)  •  g(x)}  exp{/i(0)  ■  g(x)}  <  j ;  oj  >  1. 


(2-14) 
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Recall  that  for  any  test,  the  OC  should  result  in  Qi9o)  >  l-a,  and  Q(6i)  <  (see 
Appendix).  Thus,  the  motivation  is  to  find  bounds  for  the  ARL  in  terms  of 
the  OC  function  and  the  stopping  thresholds  a,b.  These  upper  and  lower 
bounds  for  the  ARL  are  given  by  Ghosh  (Ghosh,  1970): 


L(«)  =  Es{N} 


>  0 


^[a  +  y(#)|l-0(fl)l-i-i.Q(fl) 

E{x(^)|S} 


E{sW|e}  =  0  (2-15) 


E{g{xp}  <  0 


where  y{0)  =  supE|^(r:i)-r|g(x)S  r  >  is  the  "excess  over  the  boundary." 
The  mean  time  between  false  alarms  T  is  given  by  HOq)  while  the  delay  for 
detection  is  given  by  L(^). 

Detecting  a  change  from  a  negative  trend  E{^(x)  I  ©o)  <  0  to  a  positive 
trend  E[g{x)  I  0i}  >  0  (upward  change)  can  result  in  effective  bounds  for  L{6). 
Notice  that  Q(^)  >  l-a  and  Q(0i)  <  /?,  result  in  consistent  inequality  directions 
in  (2-15).  Thus,  upper  bounds  can  be  evaluated  in  the  case  of  disorder 
detection.  Similarly,  bounds  for  L{6)  in  the  case  of  detecting  a  change  from  a 
positive  trend  to  a  negative  one  (downward  change)  can  be  found  by 
reversing  the  inequalities  in  (2-15). 
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4.  llie  Operational  Characteristic  OC  and  ARL  Functions  for  One-sided 
SPRT  Tests 

For  detecting  a  change  from  a  negative  to  positive  trend,  the 
probability  that  the  one-sided  test  does  not  stop  under  6q,  can  be  found  by 
using  the  limit  of  the  two-sided  OC  function,  as  b  tends  to  negative  infinity. 
Thus,  this  probability  is  lower  bounded  by; 

Pi-{i(e„)=~}=  Urn  Q(eo) 

0— >— OO 

exp{;i(eo)  ■  a}  -  T](0o)exp{/i(0o)  •  i’} 

_  exp{/i(go)  fl}-l 

exp{/i(0o)«} 

Notice  that  the  obtained  lower  bound  avoids  the  use  of  the  functions  5(6)  and 
rjid)  which  are  difficult  to  generalize. 

The  probability  that  the  one-sided  test  terminates  under  6q  which  is 
the  size  (a)  of  the  one-sided  test  is  upper  bounded  by 

a  =  Pr{Sf^  >  =  Pr{L(0o)  <  ~} 

=  l-Pr{L(0o)  =  ~}  (2-16) 

<exp{-/T(eo)-4 

This  result  is  very  important  and  will  be  used  in  the  sequel  when  analyzing 
the  cumsum  procedures  due  to  Lorden's  criteria. 

An  upper  bound  for  the  ARL  function  of  the  one-sided  test  under  0i 
(Delay  for  detection)  is  obtained  by  using  the  upper  bound  for  the  ARL  of  the 
two-sided  test  (2-15); 
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L(^)  =  Ei{N}<  Urn 

b-*-oo 


[i+r(8)][i-Q(«,)' 

+i>Q(ei) 

Since  b  — >  -«x»,  the  OC  function  Q(6i)  is  a  decreasing  function  (monotonicity 
property  of  the  SPRT  function),  £{g(x)l  ^i)  >  0  (detecting  a  positive  trend), 
hence,  the  right-hand  side  is  a  decreasing  function  and  the  inequality  is 
preserved  as  Q(0 1)  ->  0,  resulting  in: 

L(e,)=£,{N}  =  £{N|e,}s.^i^  £{xMe,}>0.  (2-17) 


5.  SPRT  for  Composite  Hypotheses 

Although  the  SPRT  was  derived  from  a  test  of  a  simple  hypothesis 
against  a  simple  h)rpothesis,  it  was  shown  that  from  the  on-line  point  of  view 
of  detecting  abrupt  changes,  optimal  stopping  rules  do  exist  in  the  case  of  i.i.d. 
observations  with  known  distributions  before  and  after  the  change.  When 
the  distribution  after  the  change  is  not  known,  some  other  hypotheses  can  be 
considered.  Thus,  it  is  natural  to  consider  to  test  for  example  Hq:  6  <  6* 
against  Hy.  6  >  6*. 

Wald  (Wald,  1947)  considered  the  method  of  weight  functions  in 
order  to  deal  with  unknown  composite  alternatives  where  the  alternative 
may  be  a  parameter  within  a  surface  (Rejection  Region).  If  the  method  of 
weight  function  is  not  feasible,  so-called  open-ended  (one-sided)  likelihood 
ratio  test  procedures  can  be  considered.  Lorden  (Lorden,  1971)  investigated 
that  approach  for  the  problem  of  open  ended  (one-sided)  tests  for  the  one 
parameter  exponential  Darmois-Koopman  families  of  distributions.  This 
approach  leads  to  easily  computed  procedures  to  obtain  approximations  to  the 


43 


detection  probability  and  ARL  functions  of  the  SPRT  of  composite  hypotheses 
using  only  the  theory  developed  for  simple  hypotheses.  The  following  shows 
that  this  is  generally  possible  in  the  context  of  a  one  parameter  expx)nential 
family,  and  will  form  the  base  for  Lorden's  cumsum  procedure. 


a.  Composite  Testing  for  Darmois-Koopman  Distribution  Families 
(Siegmund,  1985) 

Consider  a  general  SPRT  test  defined  by  (2-2)  and  (2-3)  with  the 
additional  assumption  that  xi,  X2,  ••  are  i.i.d.,  so  that 


Sf  =  logH 


i=l 


PM 

Po(*,)' 


Next  we  follow  Siegmund's  analysis  (Siegmund,  1985)  to  derive  a  new 
observation.  Let  P,  P*  be  third  and  fourth  density  functions,  such  that  the 
original  test  of  Pq  against  Pi  is  equivalent  to  a  test  of  P  against  P*  with  new 
stopping  boundary  values,  such  that 


p\x)  [PiWf 

p{x) 


d\  *■  0. 


(2-18) 


Note  that  P*(x)  must  satisfy 


>iW 

PoW 


P{x)dx  =  1. 


Define 


z(x)  =  log 


PM 

PqM 


hence 
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r  e<^)^P{x)dx  =  l. 

oo 


Now  we  define  a  function  b(0)  such  that 

r  1  (2  - 19) 

J  oo 

where  6  represents  the  statistical  "distance"  between  the  null  and  the 
alternative  hypotheses.  Notice  that  b(0i)  =  b(0)  =  0.  If  the  last  integral 
converges  then 

J  =  1 


and 


Pffix)  = 


and 


^  z(x)0-b(0) 

P{x) 


represents  the  new  test  since 


P(x) 


fM 

Pq!^) 


(2-20) 


The  resulting  test  defines  a 


one-parameter  exponential  family  of  distributions  under  which  composite 
tests  can  be  evdluateu  easily. 

Differentiation  of  (2-19)  w.r.t  6  gives: 

and 

b"{B)  =  j^Jz{x)fPg{x)dx-[h'{e)f  =  var;,(2(x))  >  0  (2-21) 

so  b{6)  is  convex.  The  desired  0  satisfying  (2-18)  exists  if  and  only  if 
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=  jl2(i)P(Ar)dx  =  E,{z(x)}  *  0 


since  b{d)  is  convex  and  b(di)  =  b(0)  =  0  (see  Figure  2.2). 
The  original  test  of  Pq  against  Pi 


N  =  inf^ 


n: 


rr 


£ 


is  equivalent  to  a  test  of 


N  =  inf< 


n: 


(2-22) 


Since  P(r)  represents  the  null  hypothesis  under  (2-18),  it  is  clear  that 
for  composite  testing,  /7'(0)  =  E|z(ri)|  =  E|z(r)jHo}  0  implies  that  under  the 

null  hypothesis  (no  disorder)  the  test  should  give  a  negative  trend  b'{0)  <  0 
when  detecting  a  change  from  a  negative  to  positive  trend  (see  Figure  2.2). 
This  result  is  consistent  with  another  one  which  is  presented  in  the  sequel. 
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namely,  that  it  is  worthwhile  to  bias  the  detector  if  it  is  known  that  before  the 
disorder  occurs,  the  test  will  have  zero  mean.  This  gives  some  degree  of 
robustness  to  the  test  under  composite  hypothesis  testing. 

b.  Performance  Evaluation 

The  following  proposition  establishes  an  important  result  about 
the  performance  of  the  SPRT  within  the  composite  framework.  This  result 
will  be  shown  to  play  a  key  role  in  Lorden's  work  about  the  optimality  of 
Page's  test  in  the  on-line  framework  (minimum  average  delay  for  detection), 
by  assigning  a  lower  bound  on  the  ARL  for  competitors  of  Page's  test. 


Proposition  (Wald,  1947): 

Given  a  two-sided  sequential  test  of  Ho'.Xedo  against  Hy.Xe  6, 
suppose  Ni  and  N2  are  stopping  times  for  xi,  xi, ...  e  X  such  that: 

Pqq  {N\  <«')<  a  <1  and  P0(N2  <«>)</?<  1 

where  a  and  P  are  the  false  alarm  and  miss  probabilities  (respectively). 

Define: 


i{d,eo) 


E0 


log 


feix) 

feoix) 


(2-23) 


this  is  the  information  number  or  the  Kullback-Liebler  number.  Then: 

•  l{e,do)  E0{mm{N]  N2)}  >  (1 -/J)lna‘^  -  ln2 

•  and  for  N2  +~(one  -  sided  test)  and  plO,  (2-24) 


The  proof  can  be  found  in  Wald's  book  (Wald,  1947,  p.  197) 
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Remark:  The  last  proposition  gives  a  lower  bound  on  the  average  delay  for 
detection  D  =  E{N  |  Hi)  for  any  stopping  rule  for  which  a  =  PoiN<o=)  <  i  (see 
Figure  2.3).  For  the  SPRT  we  have  the  approximate  relations  (2-13)  between 
the  ARL  and  the  error  probabilities.  The  last  proposition  generalizes  (2-17)  for 
composite  tests,  in  asserting  that  the  ARL  function  in  (2-13)  is  approximately 
minimal. 


Figure  2.3.  Sequential  Test  Exit  Times 


c.  Performance  of  Sequential  Composite  Tests  in  the  Presence  of  a 
Weak  Signal 

In  classical  detection  theory,  the  locally  optimum  detector 
maximizes  the  slope  of  the  power  curve  with  respect  to  signal  strength 
(evaluated  at  zero  signal  strength  or  at  the  presence  of  a  weak  signal)  for  a 
fixed  false  alarm  (Neyman-Pearson  locally  optimum  procedure).  In  this 
composite  alternative  hypotheses  approach,  the  alternatives  6  are  close  in  the 
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sense  of  metric  or  distance  to  the  null  hypothesis  6q.  It  can  be  shown 
(Kassam,  1987)  (Poor,  1988)  (Kazakos,  1977)  that  the  locally  optimum  detector 
in  the  classical  detection  problem  of  Hq:  z,  ~  Pix  \  ©o)  versus  Hy.  Xi  -  Piix)  = 
P{x  1  0)  for  0  >  6b  is  given  by: 


gtoix)  =  - 


P(*l»o) 


Pq'W 

Po(a:) 


=  -^ln{P(x|9)}|e,^  (2-25) 


where  6-  do  indicates  the  "distance"  between  Hi  and  Hq,  and  6-^  6o  indicates 
a  weak  signal  situation,  resulting  in  the  locally  most  powerful  (LMP) 
nonlinearity  g/o-  For  the  ST  defined  by  (2-2)  we  can  define  the  Signal-to-Noise 
Ratio  (SNR)  (Kassam,  1988): 


SNR  4 


(E{S.|g}f 

var{S„|0o)' 


We  seek  to  maximize  the  SNR  when  6  —y  Oq. 

The  efficacy  £  of  a  test  is  defined  (Kassam,  1988)  as  the  limiting 
incremental  signal  to  noise  ratio: 


£(^)=  lim 
0-»0O 


e=eo  j 

n  ■  var|S„ 

So) 

var{g(a:)l0o} 
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The  nonlinearity  g(x)  that  maximizes  the  efficacy  is  the  local  optimum 
nonlinearity  g/o(x)  =  -Pr)'(x)/  Pq(x)  which  is  also  the  Neyman-Pearson  locally 
optimum  procedure.  In  this  case  the  efficacy  is  equal  to  Fisher's  information 
for  a  location  shift  test,  namely:  Hq:  Xj  ~/(x)  versuL  ,  'i:  Xj  ~f(x-0),  and  is  given 
by  (Kassam,  1967^; 


6.  Practical  Criticism  of  the  SPRT  and  Truncated  Tests 

The  optimality  property  of  the  SPRT  is  a  remarkably  strong  property 
but  it  applies  only  to  simple  hypotheses.  Even  for  the  simple  case  of  constant 
signal  detectors  as  shown  in  (Poor,  1988),  it  is  necessary  to  know  the  signal 
value  in  order  to  implement  the  test.  This  is  in  contrast  to  the  Fixed  Sample 
Size  tests  which  are  UMP  for  6  >  0,  For  applications  involving  composite 
testing,  the  open  continu'iion  region  can  lead  to  very  large  sample  sizes, 
especially  when  E{log[/](j:)//o(jr)]}  s  0.  Thus,  although  the  ARL  of  the  SPRT  is 
finite  with  probability  I,  it  is  not  bounded.  This  difficulty  can  be  overcome  by 
modifying  the  SPRT  to  stop  sampling  and  make  a  hard  (single-tnreshold) 
decision  after  the  ARL  has  reached  some  maximum  number  of  samples.  This 
type  of  test  is  known  as  the  truncated  test  and  can  be  described  as  follows:  The 
sequential  test  is  defined  by 

N  =  inf{n:  Sn  e  (a,h)}. 

In  the  absence  of  a  definite  upper  bound  on  N,  we  define  an  upper  bound  M. 
Hence,  the  new  (truncated)  stopping  rule  is  given  by 

min(N,M). 


SO 


Another  problem  associated  with  the  SPRT  is  the  estimation 
following  detection.  If  we  want  to  stop  sampling  as  soon  as  it  is  possible  to 
tell  in  which  of  two  subsets  of  the  parameter  spacj  a  parameter  lies,  then 
usually,  the  estimation  procedure  will  require  an  adequate  number  of 
samples  which  is  larger  than  the  sample  size.  A  possible  solution  is  to 
artificially  enforce  a  larger  sample  size.  However,  sequentially  stopped 
versions  the  estimators  are  biased,  while  we  would  like  to  consider 
unbiased  estimators.  The  problem  of  estimation  following  sequential  tests  is 
not  a  part  of  this  work.  However,  in  the  disorder  detection  framework,  we 
are  interested  in  randomly  stopped  averages  where  m  is  a  random 

variable.  The  Anscombe-Doeblin  theorem  (Siegmund,  1985)  shows  that  such 
averages  are  asymptotically  normal  under  quite  general  conditions. 


E  CUMSUM  PROCEDURES 


Assuming  that  a  given  process  has  i.i.d.  observations  X\,X2,  ...,  whose 
distribution  possibly  changes  from  Pq  to  P\  at  an  unknown  point  in  time  V', 
then,  in  the  hypothesis  testing  framework  the  problem  car.  be  presented  as: 

Hq.  X\,X2r-  ■  ■  ~  Pq 


versus 

xi,X2,...,Xy.]  -Pq 


v>  1 


1 


(2-26) 


Let  Pv  and  Ey  denote  the  probability  measure  and  the  expectation  under  Py 
respectively,  when  the  change  from  Pq  to  Pi  occurs  at  the  cample 
(v'=l,  2, ...).  Let  Pq  denote  the  probability  that  there  is  no  change,  i.e.,  v  =  «>. 
Using  Lorden's  definition  (Lorden,  1971); 
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D  =  E-[{N]  =  su^esssupEy|(N  -  v  +  l)‘^|xi,X2, 


where;  (a)^  =  max(0,a)  (2 -27a) 

which  is  the  worst  possible  or  the  least  favorable  conditional,  expected 
detection  delay  or  quickness  of  reaction  to  a  change  (disorder).  Thus,  a 
"minimax"  type  of  criterion  is  defined  for  which  the  delay  D  is  the  smallest 
value  such  that  for  every  v  >1 

Ev,|(N  -  v  +  1)^|xvJ:2/-”/^v-i}  ^  El{N} 
almost  surely  under  Fq. 

The  goal  is  to  find  the  stopping  time  N  which  allows  the  quickest 
detection  of  the  change,  subject  to: 

EofN)  >  y.  (2-27b) 

The  constraint  implies  that  if  the  change  does  not  occur,  then  the  expected 
time  for  false  alarm  is  no  less  than  the  threshold  of  y  (where  7  — >  <» 
asymptotically). 

Several  ad  hoc  proposals  to  solve  this  multiple  hypothesis  problem  that 
at  least  one  of  the  Hy  hold  (1  <  v  <  n)  against  Hq-  The  most  well  known 
procedures  are  the  Page-Hinkley  and  Shiryayev-Roberts  tests,  and  will  be 
presented  in  the  sequel.  Both  are  based  on  the  probability  ratio,  hence, 
presuming  the  properties  presented  in  the  last  section. 

1.  The  Page  Ctunsum  Test  (Page,  1954) 

Page's  procedure  has  two  equivalent  implementations:  Recursive 
test  which  can  be  considered  as  a  repeated  modified  one-sided  SPRT  test  with 
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constant  stopping  limits,  and  a  Repeated  SPRT  with  a  moving  indifference 


zone. 


a.  Repeated  SPRT  with  Moving  Indifference  Zone 
Consii.er  the  test: 


=  max  (S„  - 4)  =  -  nun 

0<k<n^  '  0<k<n 


where 


Sn  =  Zlog 


J=1 


Poi^i) 


(2-28) 


^  =  0 

The  indifference  interval  equals  to  iO/i).  The  stopping  rule  based  upon  (2*28) 
is  defined  as 


n: 


S„ 


~  min  Sj^ 

0<k<n 


(2  -  29) 


Note  that  gn  =  -  min  S;.  measures  the  current  height  of  the  random  walk 

0<k<n 

Skf^  =  0,  1,  2,  ...  above  its  minimum  ’■/alue.  Whenever  the  random  walk 
establishes  a  new  minimum,  the  process  forgets  its  past  and  starts  again  in  the 
sense  of  a  renewal  process  (see  Figure  2.4): 

=  _rmn  S*  =  S„  - S„  -  min(S„^^  -  S„ )  (2  -  30) 

This  renewal  property  has  important  consequences.  It  means  that 
N*  can  be  defined  in  terms  of  a  sequence  of  one-sided  SPRTs  as  follows: 
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The  overall  stopping  time  is  given  by: 

N*  =  Ni +N2  + ... +Nm  (2-32) 

and  is  called  the  extended  stopping  time,  since  it  consists  of  sum  of  single 
SPRT  stopping  times,  where: 

M  =  inf|m:  S^j+...+n„  (2-33) 

is  the  number  of  the  repetitions  (renewals).  By  (2-33),  M  is  geometrically 

distributed  (see  also  (Siegmund,  1985))  with: 

E{M}  =  l/Pr{S;Vi  ^4 

M 

Define:  N*  =  ,  and  using  Wald's  identity  (2  -  7)  we  obtain 

«=l 

=  (2-34) 

PrjS^i  >  flj 

which  expresses  the  extended  stopping  time  in  terms  of  the  expected  stopping 
size  and  error  probability  of  a  single  SPRT. 

b.  Recursive  Implementation 

As  wi’l  be  shown,  the  recursive  implementation  has  two 
interpretations. 

The  first  is  the  relation  to  the  repeated  one-sided  Wald  sequential 
test  with  boundaries  0  and  a,  which  forms  a  renewal  process  whenever  the 
random  walk  Sn  hits  the  lower  boundary  0  (see  Figure  2.5).  The  renewal 
process  is  repeated  until  such  time  that  a  Wald  test  exceeds  the  threshold  a. 
Thus,  the  process  §«  can  be  described  as 
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(3-35) 


S„  =max{0,S„_,+g(jr„)} 
So  =s 


while  the  extended  stopping  time  is  given  by: 

N*  =  inf|n:S„  >  a| 


This  representation  is  equivalent  to  the  original  Wald  test 
stopping  rule: 


N  =  inf{«:S„  <  0  or  S„  >  fl) 


which  is  repeated  from  the  initial  score  So  each  time  S„  <  0  (zero  being  the 
renewal  boundary,  hence,  the  name:  repeated  one-sided  Wald  Test),  and  so 
on,  vmtil  such  time  that  a  Wald  test  exceeds  the  threshold  a. 


Figure  2.5.  Recursive  Implementation  of  Page's  Test 


The  second  interpretation  of  the  recursive  algorithm  is  related  to 
the  connection  of  the  one-sided  first  passage  problem  with  a  single  server 
queue.  It  will  be  shown  that  a  queueing  process  Wn  can  be  described  in  terms 
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of  a  random  walk  S„.  This  fact  forms  the  basis  of  the  asymptotic  distribution 
analysis  of  w„  as  n-^oo. 

Assume  the  customers  arrive  at  single  server  at  time  ai  <  ai  +  ai 
<  a\  +  02  +  a3-  These  arrival  times  are  the  arrival  epochs  which  form  a 
renewal  process.  Assume,  ai,  02,  —  are  i.i.d.  and  let  («  =  1,  2, ...)  denote  the 
service  time  and  Wn  the  waiting  time  of  the  n^^  customer.  Suppose  that  the 
customer  arrives  at  epoch  f.  His  service  time  starts  at  epoch  t+  w„-i 
and  terminates  at  t  +  zv„-i  +b„-}  (See  Figure  2.6).  The  next  customer  arrives  at 
time  t+a„.  He  finds  the  server  free  if  zy„-i  +b„-i  <  a„,  but  has  a  waiting  time 
(server  busy)  w„  =  w„-i  +bn-i  -  a„  if  this  quantity  is  greater  or  equal  to  0. 
Denote  the  queueing  process  by  x„  =  b„-\  -  a„.  In  short: 

wn-l+xn  ifwn-l+xn  ^0 
=  I 

0  if  wn-l+xn  <  0 

or:  Wn  =  max  {0,  Wn-l+Xn) 
wo  =  0 

This  result  shows  that  if  the  service  times  bi,  b2,  ■■■  are  i.i.d.,  then  the  Xn's  are 
also  i.i.d.,  hence  the  process  is  a  random  walk  which  resets  to  0  whenever 
it  enters  (-«>,0).  In  order  to  describe  the  random  walk  Wn  in  terms  of  the 
random  walk  generated  by  the  random  variables  Xn,  define: 

Sn  =  X\  +X2  +  ...+Xn 

and  adhere  to  the  notation  for  ladder  variables.  Define  v  as  the  subscript  for 
which  Si  >  0,  $2  2:  0,  ...,  Sy-I  ^  0,  but  Sy  <  0.  By  definition,  v  is  the  first 
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n-1 
customer 


n 
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^ H  -"n 

: - , - 3 

^  tl 

'  1 

' 

fl'l 


w 


B-r 


t  =  2-.  flf 
1=1 


«•  1  • 


Vb-I+Pb-I 


service 
starts  for 
customer  n-1 


service  ends  for 
customer  n-1 
starts  for 
customer  n 


(a) 

n-1  n 


start  service  start 

service  ends  service 


(b) 


Figure  2.6.  Two  Situations  of  Server 

(a)  server  busy 

(b)  server  free 
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descending  ladder  epoch  denoted  by  J”.  Up  to  this  epoch,  the  customers 
1, 2, v-1  had  positive  waiting  times  w\  =  si,  102  =  si, t^v-i  =  Sy-i-  The  v 
customer  is  the  first  one  to  find  the  server  free.  The  first  conclusion  is  (Feller, 
1971):  The  descending  ladder  epochs  correspond  to  the  customers  who  find 
the  server  free,  (i.e.  Wk  =  0)  and  constitute  a  renewal  process  with  recurrence 
times  distributed  as  (Since  the  continuation  of  the  random  walk  beyond 
epoch  7~  is  a  probabilistic  replica  of  the  entire  random  walk). 

Suppose  now  that  customer  v-1  arrived  at  epoch  t.  His  waiting 
time  was  =  S^-i,  the  epoch  of  his  departure  is  t  +  w^-i  +  bv_i  (see  Figure 
2.6).  The  customer  v  arrived  at  epoch  x  +  a^,  when  the  server  is  free.  Thus, 
the  time  for  which  the  server  is  free  is  given  by 

free  time  =  t  +  -  (^  +  Wv~\  +  by-i)  =  fly -  ^v-l 

=  ~X\f  —  W\i~\  —  — Xy  —  Sy— 1  =  ~S\i. 

But  by  definition  Sy  is  the //rst  descending  ladder  height  :h\. 
Thus,  as  the  second  conclusion  we  have:  The  duration  of  free  periods  are 
i.i.d.  random  variables  which  constitute  a  renewal  process  with  recurrence 
time  distributed  as  -!Hy 

To  summarize,  customer  number  k  which  arrives  at  epoch 
+  ...  +  is  the  customer  that  finds  the  server  free.  At  the  epoch  of  his 
arrival  the  server  has  been  free  for  time  units,  at  the  same  time  the 
descending  ladder  height  is  given  by  Sjt  =  • 

The  remarkable  statistical  property  of  the  random  walk  as  of 
containing  two  imbedded  renewal  process:  the  ladder  epochs  and  ladder 
heights,  and  the  fact  that  the  random  walk  is  a  probabilistic  replica  of  the 
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entire  random  walk  after  the  first  ladder  epoch  (and  each  other  ladder  epoch) 
enables  important  results  to  be  found  about  the  distribution  of  the  ladder 
variables  in  terms  of  the  first  ladder  variables.  It  is  easy  to  follow  the  next 
analysis  of  Page's  cumsum  tests  (2-29)  and  (2-35)  in  terms  of  ladder  variables. 

c.  Page  Procedure  Revisited 

The  first  Page's  version  presented  by  (2-28)  and  (2-29),  implies  that 

the  time  k  for  which  min  Sj^  gets  its  minima,  is  a  descending  ladder  epoch. 

0<jk<« 

Hence,  at  that  time  k  the  test  is  renewed.  The  descending  ladder  epochs 
indicate  the  time  where  the  change  is  more  Mkely  to  happen.  A  change  is 
declared  when  the  test  is  terminated,  i.e.,  the  "distance"  from  the  last 
descending  epoch  is  at  least  a.  For  the  repeated  one-sided  Wald's  SPRT 
version  (2-35),  the  descending  ladder  epochs  are  defined  at  the  times  where 
the  random  walk  hits  the  lower  boundary  0  (see  Figure  2.7).  At  that  ladder 
epochs  the  test  is  renewed.  Once  again,  the  test  measures  the  "distance" 
between  the  current  value  of  the  random  walk  from  the  last  ladder  epoch. 
This  distance  is  equivalent  to  the  "statistical  distance"  between  Pq  and  P]  as 
defined  by  (2-26)  or  the  disorder  distance.  Notice  that  this  analysis  was  done 
for  detecting  upward  changes.  Similarly,  for  detecting  downward  changes  we 
will  use  ascending  ladder  epochs  and  the  test  terminates  when  the  test 
reaches  a  "distance"  a  below  the  last  ascending  ladder  variable. 

Three  important  observations  can  be  made:  The  first,  as  pointed 
out  before  in  the  analysis  of  Page's  test  in  a  composite  hypothesis  testing,  is 
that  it  is  worthwhile  to  bias  the  detector  if  it  is  known  that  before  the  disorder 
occurs,  the  test  will  have  zero  mean.  It  can  be  shown  from  Figure  2.8  that  a 
random  walk  with  negative  drift  will  improve  the  chance  of  rapid  disorder 
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detection  under  the  restriction  of  low  false  alarm  rate,  since  the  number  of 
ladder  epochs  will  be  larger,  resulting  in  more  renewals,  thus  having  the 
effect  of  "forgetting"  the  irrelevant  past  observations.  This  result  is  supported 
analytically  in  the  sequel  when  it  is  shown  that  the  expected  delay  for 
detection  is  reduced  by  biasing  the  test. 


Figure  2.7.  Random  Walks  w„  and  S„  Containing 
two  Renewal  Processes:  y,,  ^ 
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Figure  2.8.  The  Relationship  between  the  Recursive  Implementation  W„  and 
the  Random  Walk  Process  S„  (From  Feller,  1971) 
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The  second  observation  is  related  to  the  first  by  realizing  that  each 
time  the  test  is  renewed  (having  the  effect  of  resetting  the  test,  hence 
"forgetting"  the  past),  the  test  or  detector  behaves  like  an  adaptive  detector, 
since  when  reaching  the  descending  ladder  epoch,  the  past  noisy  observations 
containing  no  data  about  a  possible  change  can  be  ignored.  The  fact  that  at 
each  ladder  point  the  likelihood  of  a  change  is  the  greatest  implies  that  since 
the  disorder  is  a  local  phenomena,  the  detection  will  occur  if  the  signal-to- 
noise  ratio  and/or  disorder  duration  is  large  enough,  resulting  in  a  threshold 
passage.  Thus,  this  adaptive  detector  acts  like  a  low-pass  filter  which  filters 
the  incoming  signal  except  the  changes. 

The  third  observation  leads  to  the  analytical  equivalency  between 
the  Page  tests  (2-27,2-29)  and  (2-35)  and  is  found  in  Siegmund  (Siegmund, 
1985). 


Let  S„  =  By  backward  recursion, 

=  max(0,  )  =  max^O,  (u.'„_2  +  f  +  a:„ ) 

=  max(0,  u’„_2 +  x„_i +x„,  Xji) 

=  max(o,  (a’„_3+A:„_2r+A:„_i+A:„, 

=  max(0,  Wn-3+Xn-2+Xn--{+Xn,  Xn) 

=...max(0,  S„,  S„-Si, 


=  max 
Q<k<n 


iSn-Sk)  =  S„  -  max 
0<k<n 


(2-36) 


which  shows  the  equivalent  interpretation  of  the  two  procedures.  Namely, 
the  queueing  variable  w„  measures  the  departure  of  the  process  Sn  from  its 
last  maxima. 
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This  shows  also  that  the  distribution  of  and  its  asymptotic 
behavior  can  be  studied. 

Theorem  (Feller,  1971):  The  distribution  of  the  queueing  variable  w„  is 
identical  to  the  distribution  of  the  random  variable  M„,  where: 

(2-37) 

□ 


(2-38) 

This  relation  is  used  in  the  sequel  to  derive  an  expression  for  the  probability 
of  the  ARL  function  of  the  test  (see  2-52). 

The  Wiener- Hopf  integral  equation  (Feller,  1971)  can  be  used  to 
find  an  explicit  solution  to  the  probability  distribution  m(x/  -  PrlMn  <x}  = 
Pr{i:’n  <x}. 

d.  The  Page  Test  as  a  Sequential  Maximum  Likelihood  Detector 

Let  the  problem  be  specified  as  in  (2-26)  When  Page’s  test  is 
implemented  with  the  log-likelihood  ratio  (repeated  SPRT  Test),  it  is 
equivalent  to  a  sequential  implementation  of  the  maximum  likelihood 
detector.  The  log-likelihood  function  l{x\, Xn)  is  given  by: 


where: 

1=1 

Hence, 

Pr[wn  >a]  =  Pr{N’^(fl''  <n] 

where 

N*  {a)  =  inf{n:  S„  ^  fl) 

and 

lim  Pr{ui„  >  fl]  -  Pr{  N*  (a)  <  <»J 
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2.  Optimal  Properties  of  the  Page  Test 

In  this  sec*^icn  we  will  review  two  important  resul  .  due  to  Lorden 
(Lorden,  1971).  The  first  resul:  (the  following  theorem)  will  enable  the  use  of 
Wald  approximations  (2-i6,2-17)  tc  find  an  efficient  way  of  calculating  the 
perform  nee  measure  for  Page's  test.  The  second  result  establishes  the 
asymptotic  optiniality  of  Page's  test  in  the  sense  of  Lorden's  criterion. 


u.  Bounds  on  the  Performance  of  Quickest  Detection  for  Repeated 
Or  -sided  Tests 

Let  N  ^e  the  stopping  variable  ot  a  one-sided  Wald  test; 
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N  =  mf{n;  Sn  >  a) 


for  some  statistics  {S„)  defined  as  functions  of  the  i.i.d.  observations  x-i,X2,  -■ 
Let  Nk  be  the  stopping  variable  of  the  same  test  applied  to  Xk,  xk+\,  for  k  =  l, 
2, and  define 

N*  =  nrun{Nt  +  fc  -  ij 

N*  is  the  extended  stopping  variable  of  the  one-sided  test  which  stops  when 
one  of  the  sequence  of  tests  {Nk)  applied  to  Xk,  Xk+\,  stops  the  first  time. 
Theorem  (Lorden,  1971):  Let  N  be  a  one-sided  stopping  variable  with  respect 
to  x\,  X2,  such  that 

Pr{N  <  HHo}  <  a. 

Let  Nk  denote  the  one-sided  stopping  variable  obtained  by  applying  N  to  Xk, 
Xk+\,  and  define 

N  min{Nn  +  fc  - 1], 

Then, 

Eo{N*}>-^=y  (2-40) 

and  for  any  alternative  distribution  Fi, 

E,(N*)<E,{N}  0 

Notice  that  the  one-sided  Wald  test  is  applied  to  Xk,  Xk+\,  stopping  the  first 
time  one  of  these  tests  stops.  This  result  can  now  be  viewed  by  using  the 
renewal  argument:  Each  time  the  test  statistic  (2-35)  falls  below  zero,  Sj  is 
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reset  to  zero  indicating  that  a  new  test  is  starting  from  k+1  and  so  on  until  the 
first  test  reaches  the  stopping  boundary. 

This  theorem  establishes  the  optimality  of  Page's  test  (N*)  versus 
unrepeated  one-sided  tests  (N).  This  result  will  be  used  in  the  next  section  to 
derive  a  performance  measure  for  Page's  test. 

b.  Asymptotic  Optimality  of  Page's  Test 

Recall  Lorden's  criterion  definition  for  the  performance  of 
cumsum  procedures.  The  stopping  rules  N  must  satisfy 

£{N  I  V  =  oo)  =  Eo{N}  >  y. 

The  quickness  for  which  the  stopping  rule  detects  a  true  change  in 
distribution  is  evaluated  by  E\{N}  given  by  (2-27a). 

The  problem  of  minimizing  E■^{N]  subject  to  the  constraint 
Eo{N]  >  y  becomes  more  interesting  if  we  replace  the  distribution  Pi  by  the 
Darmois-Koopman  family  of  distributions  [Pq,  Be  6>)  with  6  unknown,  and  try 
to  achieve  small  Eg{N]  (defined  as  E,{N})  for  each  9  subject  to  Eo(N}  >  y.  To 
handle  composite  (P©),  one-sided  sequential  tests  of  Pq  vs.  {P^j  are  applied  to 
Xk,  Xk+-\, for  k  =  1,2, ...,  stopping  the  first  time  one  of  these  tests  stops. 

Lorden  showed  that  we  can  simultaneously  minimize  Eg{N]  for 
each  9  asymptotically  as  «>  for  a  wide  class  of  tests.  Lorden's  main  result 
was  that  Page's  test  (A/*)  implemented  with  the  log-likeiihood  function  and  a 
zero  score  with  a  stopping  boundary  /  belongs  to  this  class.  The  following 
result  will  show  that  Page  test  achieves  the  lowest  possible  E,{  N*},  resulting 
as  an  optimal  test  both  when  Pi  is  known,  and  when  P]  is  unknown 
(composite  testing  case). 
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Let  N  and  N*  be  defined  as  in  the  last  section.  If  N  is  the  stopping 
variable  of  a  one-sided  SPRT  of  Pq  vs.  Pi  with  likelihood-ratio  boundary  1/a, 
then  by  using  Wald's  approximation  (2-7)  we  have  that 

Ei{N}  ~|logo:|//(0^,eo)  as  a->0 

where  /(©i,  6o)  is  the  information  number  as  defined  by  (2-23).  Applying  the 
last  theorem,  we  obtain  that  N*  (Page's  procedure)  satisfying  Eo{N*]  >  cr^  and 
E]{N)  is  asymptotically  at  most  |loga|/ J{0j,0o)  as  a  0,  and  this  is 
asymptotically  the  best  we  can  do.  In  other  words: 

inf£,{N}S£,{N*}--!^  a5r=a''->"-.  (2-41) 

Lorden  also  showed  that  we  can  simultaneously  minimize  Eg{N}  for  each  6 
asymptotically  as  for  a  wide  class  of  tests: 

Moustakides  (Moustakides,  1986),  extended  these  results  to  the  non- 
asymptotic  case  where  yis  finite. 

F,  PERFORMANCE  ANALYSIS  OF  THE  PAGE  TEST 

In  1954,  Page  (Page,  1954)  introduced  a  control  chart  procedure  based  on 
the  repeated  one-sided  Wald-SPRT  test  with  boundaries  (0,fl),  zero  being  the 
renewal  boundary  and  a  the  stopping  boundary. 

Let  the  problem  formulation  be  according  to  (2-26),  that  is, 

N*  =  inf|n:  S„  >  a| 

and  let  Lis, 6)  be  the  ARL  of  this  test  with  initial  score  So  =  s  when  {x,}  are  i.i.d. 
Fix  i  6)  distributed,  i.e.. 


68 


L{s,e)  =  E[N*\So  =  s,e]. 

Consider  now  Wald's  test 

N  =  inf{«:  S„  <  0  or  S  >  a]. 

Similarly,  let  Lw(s,6)  denote  the  ARL  of  Wald's  test, 

Ly,(s,d)  =  E[N\So  =  s,e} 

and  let  Qw(s,d)  be  the  Operating  Characteristic  (OC)  of  the  same  Wald  test,  that 
is 

Qu>(S/0)  =  P(Sn  <  0 1  So  =  s,0). 

Then,  the  ARL  of  Page  test  L(s,6)  is  given  (Page,  1954) 

4(0,9) +L.(s,e).  (2-42) 

Lorden  (Lorden,  1971)  and  Benveniste  (Benveniste,  Ed.,  1986)  showed  that 
the  least  favorar'..^  delay  for  detection,  D,  as  defined  by  Lorden  for  Page's  test 
occurs  when  the  test  statistic  is  zero  when  the  change  or  the  disorder  occurs, 
i.e.,  =  0,  since  the  test  statistic  has  the  longest  path  to  travel  towards  the 

stopping  boundary.  Thus,  the  ARL  function  with  initial  score  So  =  0, 
determines  both  the  false  alarm  rate  T  and  the  delay  for  detection  D  as  given 

T  =  L(0,»n)  =  E(,(N*)=  (2-43) 

'  '  '  i-Q.(o,e„) 

D-L(0,e,)=E,(N',=  — (2-44) 
Possible  ideal  and  real  ARL  functions  are  presented  in  Figure  2.9. 
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(a)  Real  and  Ideal  ARL  Functions  for  Testing  0<0o  against  0>  0q 


(b)  Real  and  Ideal  ARL  Functions  for  Testing  6  =  against  B^Oq 

Figure  2.9.  Possible  Ideal  and  Real  ARL  Functions 

Remark:  The  local  properties  of  cumsum  tests  can  be  measured  in  terms  of 
the  derivative  of  the  function  Lid)  at  Oq,  since  the  local  properties  of  a  test 
measures  the  test's  performance  as  6\-^6o  (when  the  statistical  "distance" 


between  the  two  hypotheses  tends  to  zero),  meaning  weak  signals  or  a  low 
signal  to  noise  ratio  case. 

It  was  shown  (Nikiforov,  1986)  that  convenient  measures  can  be  defined 
by  using  the  derivative  ^  of  the  ARL  function  for  determining  a  local 


approximation  (see  Figure  2.8),  where  4  is  defined  as 


if  4*0 

e  =  eo 


and  if  (^  =  0,  the  local  approximation  is  given  by 


d^Lje) 

de- 


if  4  =  0. 

e  =  0o 


(2-45) 


This  forms  the  basis  for  what  is  called  in  the  sequel  the  local  approach  for 
cumsum  tests,  resulting  in  local  sequential  tests. 


1.  The  Lorden  Approximations 

As  shown  in  the  last  section  and  given  by  (2-40),  Lorden  established 
bounds  on  the  delay  for  detection  and  the  false  alarm  rate  for  Page’s  cumsum 
test  in  terms  of  the  Wald  sequential  test. 

Lorden's  theorem  (2-40)  can  be  applied  to  Page's  test  with  nonlinearity 
g{x)  and  a  zero  score  to  obtain  a  new  bound.  Using  Wald's  lower  bound  (2-16) 
as  derived  for  his  one-sided  sequential  test  we  obtain 

a  <  exp{-/i(0o)) 

Thus,  from  (2-40),  the  mean  time  between  false  alarm  can  be  lower  bounded 
by 

T  =  EoiN*}  >  exp[hido)  a),  (2-46) 
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where  HOq)  is  the  unique  non-zero  root  of  the  moment  generating  function 
(2-11).  Using  (2-17)  and  applying  Lorden's  theorem,  the  delay  for  detection 
can  be  upper  bounded  by: 

Equations  (2-46)  and  (2-47)  are  known  as  Lorden's  bounds. 

Remark:  Notice  that  the  mean  time  between  false  alarms  (2-46)  is  an 
exponential  function  of  the  stopping  bound  a. 


2.  Wald  Approximations 

Similar  results  can  be  obtained  by  using  the  approach  proposed  by 
Nikiforov  (Nikiforov,  1986).  Recall  the  approximation  (2-14)  and  (2-15) 
obtained  for  the  CXI  and  ARL  functions  for  the  two-sided  Wald  sequential 
test.  These  approximations  can  be  used  with  the  modification  fcTO  (zero 
renewal  boundary  for  Page's  test).  Once  again,  a  lower  bound  for  T  and  an 
upper  bound  for  D  will  be  derived. 


CL  Lower  Bound  for  T  when  E[gix)  I  Oq)  <  0: 

Using  Page's  result  (2-43)  and  the  bound  (2-15)  yields: 


T  = 


L(O,0o)  =  lirn 


UP  A) 

Vrol-Q(0,6o) 


>  lim 


fto  E{g(x)|0o} 


Q(e.) 

i-Q(eo) 


Notice  that  the  right  hand  side  is  a  decreasing  function  of  QiOg)  (since 
d/d(3(Q/(l-Q))  >  0  and  b  <  0).  Hence,  using  the  upper  bound  for  (3  obtained 
for  hiOo)  >  0  given  by  (2-14),  we  obtain; 
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T  >  lim 

btO 


1 

E{x(*)|eo} 


fl  +  y(0o)+fc- 


g(go)exp{h(go)fl}-l 

l-exp{;i(eo)i^} 


The  last  term  is  a  function  of  b,  but  both  the  numerator  and  denominator 
approaches  zero  when  btO.  Using  L'Hopital's  rule  and  using  the  fact  that 
5(6b)  ^  1  we  obtain  (Broder,  1990): 


£{s(4So] 


‘'+r(»o)+ 


l-exp{h(eo>i} 

m) 


(2-48) 


Remark:  By  (2-49)  the  mean  time  between  false  alarms  is  a  quadratic  function 
of  the  stopping  bound  a.  Recall  that  by  (2-46)  it  was  shown  that  when 
E{g{x)  I  ^))  <  0,  the  mean  time  between  false  alarms  is  an  exponential  function 
of  the  stopping  bound.  Hence,  once  again  it  is  demonstrated  that  it  is 
worthwhile  to  bias  the  test  to  have  a  negative  drift  before  the  change. 
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resulting  T  as  an  exponential  function  of  the  stopping  bound  instead  of  a 
quadratic  one  (Broder,  1990). 


c.  Upper  Bound  for  D 

Consider  now  the  delay  for  detection.  Since  that  for  detecting 
upward  change,  after  the  disorder  E{g(x)  |  B]}  >  0  results  in  h($\)  <  0.  Using  (2- 
44),  (2-14)  and  (2-15)  in  the  same  manner  as  before  we  obtain: 

D  =  L(0,^i) 

i-Q(o,e,) 

^  1  ['■  +  y(9|)][l-Q(e,)]  +  bQ(^,) 

E{g(xp,} 


Once  again,  since  the  right  hand  side  is  a  decreasing  function  of  Q(0]),  and 
since  b  <  0  the  inequality  is  preserved.  Thus,  as  Q(^)i0,  we  can  replace  Q(0i) 
with  zero  and  obtain 


D< 


a  +  yfOl) 
E{g{x}\e]} 


(2-50) 


which  is  consistent  with  Lorden's  bound  (2-47). 

3.  Asymptotic  Performance  and  Measures 

For  all  disorder  detection  schemes  the  pair  (T,D)  determines  the 
detector  performance,  just  as  Pd  versus  PpA  (Pd  being  the  probability  of 
detection  and  Pfa  is  the  probability  of  false  alarm)  determines  the 
performance  of  a  classical  detector. 
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As  shown,  the  ARL  function  determines  uniquely  the  pair  (T,D). 
Thus,  it  is  in  our  interest  to  examine  its  asymptotic  performance.  Hinkley 
(Hinkley,  1972)  used  an  asymptotic  performance  measure  for  the  nonlinearity 
gix)  while  using  Pq  and  PfA  as  performance  criteria.  This  measure  was 
derived  while  calculating  the  efficiency  of  the  cumsum  procedure.  The 
proposed  measure  was 

logE{exp{-)i(^)g(x))  1  &i). 

Recently  (Broder,  1990),  another  performance  measure  was  proposed 
resulting  in  an  alternative  technique  which  allows  recursive  computation  of 
the  ARL,  the  stopping  bound  a  increases,  avoiding  the  complicated 
numerical  integration  needed  to  generate  the  performance  curves  (solution 
of  Fredholm  type  integral  equations). 


a.  Asymptotic  Approximation  of  the  ARL  Function 

Central  limit  theorem  for  renewal  processes.  (Ross,  1989):  For  large  t,  N{t) 

t 

being  a  renewal  process  is  approximately  normally  distributed  with  mean  — 
and  variance  id^l  p?,  where  pip  ^0)  and  a  are  respectively  the  mean  and  the 
variance  of  the  interarrival  distribution. 


lim 


Nit)-t/p 


<x} 


0(X) 


(2-51) 


where  0(x)  is  the  Gaussian  cumulative  distribution  function: 


0(x)  = 


,-x^/2 


dx. 
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Khan  (Khan,  1981)  used  this  result  to  show  the  asymptotic 
normality  (in  the  sense  of  a-»oo)  of  the  run  length  of  Page's  test  with  a 
stopping  boundary  a  under  P(6\): 


define: 

then: 


M 


var[g(j:)|9,] 

N(0,1) 


where  N(0,1)  is  a  Gaussian  distribution  with  zero  mean  and  a  unit  variance. 
Using  this  asymptotic  distribution  and  the  results  derived  in  (2-38),  a  new 
approximation  can  be  established  for  the  asymptotic  probability  that  the 
average  delay  is  less  than  a  given  threshold: 


/ 

Pr{L(9,)  <  a:}  =  <r> 

V 


x-al  jJ. 

Vflcr  //i^ 


j 


(2-52) 


b.  Alternative  Asymptotic  Performance  Evaluation 

An  alternative  way  to  evaluate  the  Page  test  under  different 
nonlinearities  for  various  noise  distributions  has  been  shown  by  Broder 
(Broder,  1990).  Define  an  asymptotic  performance  measure 

ARLg— )«>  ARL(9]) 


T  — »«>  D 


fl— »oo  D 


(2-53) 
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Notice  that  for  77  to  reach  a  finite  bound,  both  T  and  D  approach  infinity  as 
fl  -»  <»,  thus  T]  reflects  the  asymptotic  performance  and  is  the  reciprocal  of  the 
slope  of  the  (D,  logT)  performance  curve.  This  performance  measure  can  be 
interpreted  in  two  ways:  First,  as  the  ratio  of  the  run-length  for  the  two 
hypotheses.  Hence,  the  larger  tj  indicates  better  asymptotic  performance; 
second,  as  an  Asymptotic  Relative  Efficiency  (ARE)  between  the  two  tests. 
Recall  that  for  a  fixed  mean  time  between  false  alarms  (large  enough): 

•  i,^-log(T2)  D,  L,(6I,) 

hence,  resulting  in  the  delay  ratio  of  the  two  tests. 

Using  Lorden's  approximations  (2-46)  and  (2-47)  for  the  Page  test, 
and  ignoring  the  "excess  over  the  boundaries”  we  get: 

log!  >  h{6o)a 
D<  -- 

hence 


(2-54) 


lI>h(eo)  E{g(x)|e,)  =  2.  (2-55) 

This  lower  bound  rj  can  be  defined  as  the  asymptotic  performance  measure, 
thus,  being  a  convenient  way  to  "measure"  Page's  test  using  different 
nonlinearities  g(x)  for  various  noise  distributions. 
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Notice  that  the  lower  bound  rj  can  be  used  as  an  alternative  way 
to  estimate  the  expected  delay  given  the  desired  large  mean  time  between 
false  alarms. 

Property:  If  g  is  the  log-likelihood  ratio,  g(x)  =  log[(/(x  1  B  \  )/i{x  \  0o)]/  then 
under  any  noise  distribution  the  bound  is  tight,  i.e.,  rj  =  1^  =  /(0i,0o)  where 
I(B\,6o)  is  the  Kullback-Liebler  information  number  (2-23). 

Proof:  Since 


exp 


log 


f(A^o) 


B, 


using  the  moment  generating  function  identity  (2-11)  it  becomes  obvious  that 

hiOo)  s  1. 


Hence, 


r,  =  £{j:(Ar)|fl,}  =  7,  =  i(e,,e„). 


lim 

T  —>00  D 


0 


Recall  that  by  definition:  T  =  Eo{N*}  >  a“’ 

where  a  is  the  Probability  of  false  alarm.  Hence 

log  ort 

=  /(6>i,Po)  a-^0 


or 


^  ~  m,  Bb) 


a  —>  0. 


(2-56) 


Hence,  (2-55)  can  be  seen  as  a  generalization  of  Lorden's  result  (2-41)  for  any 
nonlinearity  function.  The  root  h(Bo)  of  the  moment  generating  function 
identity  "scales"  (2-41),  thus  (2-55)  establishes  a  general  bound  which  can  be 
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evaluated  for  any  nonlinearity  g(x)  under  any  noise  environment.  We  vvill 
be  interested  in  the  cases  where  the  strict  equality  exists  (rj  =  tj),  v’hich  enables 
us  to  get  a  precise  relationship  between  the  delay  and  the  false  alaiin  rate.  Tn 
the  next  chapter  we  will  see  some  examples  for  which  t\=  r]. 

C  SUMMARY 

In  this  chapter  we  show  that  detecting  a  disorder  presented  in  che 
multiple  hypothesis  framework  (1-2)  can  be  done  by  using  cumsum  type 
procedures.  One  of  these  procedures,  called  the  Page  test,  was  presented  and 
investigated  in  depth.  Using  renewal  theory  and  ladder  variables  we  present 
a  new  technique  to  observe  the  properties  of  Page's  test.  Three  observations 
are  shown:  first,  it  is  worthwhile  to  bias  the  test  if  it  is  known  that  before  the 
disorder  the  mean  of  the  statistic  is  zero;  second.  Page's  test  behaves  like  an 
adaptive  detector  in  the  sense  that  the  ladder  epochs  form  a  local  minima  (or 
maxima)  process  in  which  the  past  observations  which  do  not  contribute 
relevant  information  about  the  change  are  forgotten.  Finally,  we  showed  the 
equivalent  representation  of  Page's  test  in  the  off-line  and  on-line  versions. 

Page's  test  implemented  with  the  log-likelihood  nonlinearity  is  shown  to 
be  the  MLE  of  the  change  time  (within  the  multiple  hypotheses  testing  (1-2) 
framework).  Using  Lorden's  results,  the  asymptotic  optimality  of  Page's  test  is 
obtained  in  the  sense  that  Page's  test  implemented  with  the  log-likeliho  d 
nonlinearity  is  the  optimal  stopping  rule,  that  is,  the  average  delay  for 
detection  subject  to  the  false  alarm  rate  which  tends  tr  /.c’-o  is  n'/nimizeu. 
Thus,  the  log-likelihood  nonlinearity  is  shown  to  be  the  ^^ptimal 
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nonlinearity,  and  therefore.  Page's  test,  which  is  the  MLE  for  this  case,  is 
shown  to  be  the  quickest  detector  for  the  disorder  problem. 

Finally,  performance  evaluation  of  Page's  test  was  derived.  The  main 
results  are  what  is  called  the  Lorden  approximations  for  the  mean  time 
between  false  alarms  T  (2-46)  and  delay  D(2-47)  and  similarly,  the  Wald 
approximations  for  T  (2-48)  and  D(2-50). 

In  addition,  using  Broder's  results,  the  asymptotic  performance  measure 
is  shown  to  be  lower  bounded.  The  problem  of  how  informative  the  bound  is 
for  different  nonlinearities  will  be  analyzed  in  the  next  chapter.  Here  we 
show  that  for  the  optimal  nonlinearity  the  log-likelihood,  the  bound  is  tight, 
i.e.,  the  bound  provides  all  the  information  needed  to  predict  Page  test 
performance.  Finally,  a  new  simple  generalization  of  Lorden' s  result  was 
shown  for  any  nonlinearity  function  in  any  noise  environment. 


III.  THE  APPLICATION  OF  PAGE'S  TEST  TO  PARAMETRIC  AND 


NONPARAMETRIC  CHANGE  DETECTION 

A.  INTRODUCTION 

In  the  last  chapter  it  is  shown  that  implementing  the  Page's  test  with  the 
log-likelihood  ratio  nonlinearity  results  in  the  Maximum  Likelihood 
Estimator  (MLE)  of  the  change  time.  Furthermore,  it  is  the  quickest  detector 
of  the  disorder.  The  problem  becomes  much  more  complicated  when  the 
model  parameters  after  the  change  are  not  known.  In  this  case,  the  unknown 
random  variables,  v  the  change  time,  and  the  model  parameters  6,  have  to  be 
estimated.  Thus  the  detection  problem  can  be  presented  in  the  estimation 
framework.  Joint  estimation  of  v  and  0  is  a  very  difficult  task  because  the 
disorder  occurs  at  an  unknown  time  and  the  presence  of  several  unknown 
parameters  forces  the  use  of  suboptimal  detectors.  Hereby,  we  present  some 
competitive  ad-hoc  methods  used  for  detection  and  if  possible  also  estimation 
of  the  change  time  and  the  model's  parameters. 

1.  Likelihood  Oriented  Methods 

In  situations  such  as  detection  of  an  unknown  change  magnitude  of 
Gaussian  signals  it  is  possible  to  perform  the  joint  estimation  of  v  and  the 
unknown  parameters  6  (Basseville,  1988).  In  such  cases,  the  detection 
approach  consists  of  replacing  the  unknown  jump  magnitude  of  the  model 
parameter  by  its  MLE.  The  Generalized  Likelihood  Ratio  (GLR)  test  of  the 
joint  estimation  becomes 
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Hi 

max  maxS”(v,0i)  ^  A 
l<v<n  ft 

Hq 

where  S"(v,0i)  is  the  log-likelihood  cumsum  statistic.  This  double 
maximization  problem  of  estimating  both  the  change  time  and  the 
parameters  is  reduced  to  a  single  maximization  of  the  cumulative  sum  since 
the  Gaussian  characteristic  of  the  signal  to  be  detected  allows  an  explicit 
solution  as  a  function  of  6\  for  the  likelihood  ratio  test.  (Basseville  and 
Benveniste,  Eds.,  1986,  Chapter  1).  Hence  the  change  time  estimate  becomes 

v(r)  =  argminS"(v,9i). 

This  property  is  still  valid  in  a  more  general  situation  when  we  consider  the 
problem  of  detecting  additive  changes  in  linear  models  described  in  state- 
space  representation  and  leads  to  an  efficient  change  detection  algorithm  with 
reasonable  computing  cost.  An  earlier  approach  consists  of  monitoring  the 
innovations  of  a  Kalman  filter,  because  of  the  linear  property  of  the  system 
and  additive  effect  of  the  change  on  the  system,  it  may  be  shown  (Willsky  and 
Jones,  1976)  that  the  effect  on  the  innovation  is  also  additive.  Moreover,  the 
Gaussian  characteristic  of  the  state  and  observation  noises  which  ensures  for 
an  explicit  solution  in  6\  for  a  likelihood  ratio  test,  is  still  valid  in  this 
situation.  These  points  were  explored  by  Willsky  and  Jones  (1976)  who 
derived  a  recursive  algorithm  for  the  GLR  test  computed  for  the  innovation 
of  a  Kalman  filter  designed  under  the  no  change  hypothesis.  The  distribution 
of  these  innovations  with  respect  to  its  past  values,  thus,  the  cumulative  sum 
to  be  computed  in  this  case  is  in  the  form  of 
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S/K.P»,)  =  I‘og 


k^i 


Pe^{xk\xk-v->xo) 

P6o{xkh-v---rXo) 


where  reflects  the  change  in  a  certain  parameter  (change  in  the  mean, 
variance,  etc.).  The  GLR  test  is  then 

i  4| 

^0 


As  mentioned  above,  the  maximization  over  6-[  is  explicit  because  of  the 
Gaussian  assumptions  of  white  noise  and  additive  changes,  hence  the  test  for 
the  change  time  is  reduced  to  a  single  authorization  even  in  this  more 
general  situation. 

In  the  case  of  detecting  changes  in  model  eigenstructure  such  as 
changes  in  AR  or  ARMA  models  or  equivalently  in  the  state  transition 
matrix  of  a  state-space  representation  of  a  model,  the  problem  of  the  joint 
estimation  of  the  change  time  v  and  the  changing  parameters  is  more 
complicated.  At  this  point  we  need  to  distinguish  between  two  types  of 
situations:  in  the  first  case,  if  the  signal  or  system  is  known  to  have  the  same 
behavior  as  an  AR  or  ARMA  process,  then  the  model  is  descriptive  enough 
for  its  parameters  behavior  to  be  detected  (Basseville,  1988).  The  second  case 
reflects  a  situation  where  the  signal  or  system  is  not  known  and  the  main 
issue  is  to  detect  changes  in  the  eigenstructure,  then  the  AR  or  ARMA 
models  are  nothing  but  a  tool  for  detection  of  such  changes. 
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2.  Simplification  of  GLR  Tests  (Two  Models  Approach) 

In  the  case  of  segmentation  of  signals  resulting  from  AR  models  such 
as  speech  segmentation  (Andre-Obrecht,  1988)  the  detection  of  abrupt  changes 
in  the  AR  parameters  is  performed  via  the  comparison  between  a  long-term 
model  Mo  identified  in  a  growing  window  and  a  short-term  model  M\ 
identified  in  a  sliding  window  of  fixed  length.  (See  Figure  3.1)  This  method 
is  shown  to  be  a  simplification  of  the  GLR  test  since  for  implementing  the 
GLR,  the  maximization  over  6  (the  AR  vector  parameter)  i.s  no  longer  explicit 
because  the  change  is  not  additive  on  the  observation.  Moreover  in  the  case 
of  ARMA  models,  the  cumsum  is  no  longer  linear  in  the  parameter, 
therefore  the  test  becomes  quite  expensive  since  for  each  possible  change  time 
r  we  need  to  use  the  data  {r,  r+1, ...,  n)  for  identifying  the  AR  model  M\  after 
the  change  and  compute  the  log-Iikelihood  ratio  cumsum  S”,  then  maximize 
over  r.  In  the  case  of  AR  models,  this  method  is  not  only  expensive  but  leads 
also  to  boundary  problems  (Deshayes  and  Picard,  1986). 

The  two  model  approach  simplifies  the  GLR  test  by  using  a  fixed 
length  sliding  window  as  opposed  to  varying  length  windows  needed  to 
implement  the  GLR  test.  Different  statistical  distance  measures  between  the 
long-term  and  the  short-level  models  were  proposed  by  Appel  and  Brandt 
(1983)  and  Segen  and  Sanderson  (1980),  Basseville  and  Benveniste  (1983, 
1986),  Ishii  and  Iwata  (1979)  and  Andre-Obrecht  (1988).  Most  of  these 
measures  are  based  upon  innovation  testing  which  in  turn  is  based  upon  the 
conditional  distribution  of  the  observations. 
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Figure  3.1.  Schemes  for 
a.  GLRtest 

b.  Two  Model  Method 


3.  The  Statistical  Local  Approach 

Another  approach  for  overcoming  the  drawbacks  of  the  GLR  tests  is 
known  as  the  "local  approach"  and  has  been  introduced  in  change  detection 
problems  by  Nikiforov  (1983,  1986)  for  on-line  detection  for  AR  models. 

The  original  idea  of  Nikiforov  consists  of  looking  for  small  changes 
in  AR  or  ARMA  models  and  using  the  Taylor  expansion  of  the  log-likelihood 
function.  His  method  results  in  a  statistic  function 

e  =  eo 
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In  other  words,  instead  of  monitoring  the  observation  process  {x  „}  or  the 
innovation  process,  the  local  approach  monitors  gix).  The  key  result  (Deshays 
and  Picard,  1986)  is  that  there  exists  a  central  limit  theorem  for  gfx),  for  which 
any  change  in  6  is  reflected  in  a  change  in  the  mean  of  gix)  for  which  Page's 
test  or  the  GLR  tests  can  be  used.  Nikiforov  derived  two  algorithms  based 
upon  cumsum  tests  for  two  different  priors  about  the  change  directions. 
Different  applications  for  these  algorithms  are  described  in  Nikiforov 
(Nikiforov,  1986).  Another  use  of  these  methods  is  in  the  area  of  recursive 
parameter  identification.  Benveniste  (Benveniste,  1987)  has  shown  that  for 
any  general  recursive  parameter  identification  algorithm 

where  y„  denotes  the  varying  gain  and  Hr,  denotes  the  statistic,  applying  the 
local  approach  to  the  statistics  H„(^,x„)  where  do  is  a  reference  model,  enables 
one  to  transform  the  problem  of  changes  in  the  parameter  vector  d  into  the 
problem  of  detecting  a  change  in  the  mean  value  of  an  asymptotically 
Gaussian  distributed  process  which  is  a  cumsum  of  the  function  Hi  ). 

Finally,  Basseville  (Basseville,  1987)  and  Benveniste  (Benveniste, 
1987)  introduced  another  use  of  the  local  approach  technique.  In  the  case  of 
detecting  changes  in  the  AR  part  of  a  multivariable  ARMA  process  having 
unknown  and  time  varying  MA  coefficients.  Because  the  Fisher  information 
matrix  for  an  ARMA  process  is  not  block  diagonal  with  respect  to  the  AR  and 
MA  parameters  (because  of  the  coupling  between  the  unknown  monitored 
parameters  and  the  unknown  changing  MA  parameters),  neither  the 
likelihood  function  nor  its  Taylor's  expansion  (local  approach)  can  be  used. 
By  using  instrumental  statistics  on  the  observations  (Benveniste  and 


86 


Basseville  and  Moustakides,  1987),  the  changes  in  the  AR  portion  are  reflected 
in  changes  in  the  mean  of  the  instrumental  statistics.  By  looking  for  "small" 
changes  in  the  AR  coefficients,  the  local  approach  statistic,  i.e.,  Taylor's 
expansion  of  the  instrumental  statistics  results  in  an  x}-  test 

Hi 
J  X 
Ho 

where  U„  is  the  instrumental  statistic  vector  (which  is  asymptotically 
Gaussian)  and  2^,  is  its  covariance  matrix. 

B.  ORGANIZATION  OF  THIS  CHAPTER 

In  the  introduction  section,  different  competitive  methods  of  Page's  test 
were  briefly  described.  Some  of  them  enables  one  to  detect  (or  estimate)  the 
change  time  together  with  estimation  of  the  changed  parameters.  Now,  we 
will  only  be  concerned  with  the  quickest  disorder  (change)  detection  problem. 
In  the  case  of  implementing  the  log-likelihood  nonlinearity,  the  Page  test  is 
the  optimal  (quickest  detector)  for  the  disorder  problem  but  assumes  that  the 
observations  are  i.i.d.  distributed  with  one  distribution  before  the  disorder 
and  another  distribution  after  the  disorder.  However,  in  the  case  that  the 
i.i.d.  assumption  does  not  hold,  other  detection  schemes  "tuned"  to  the 
specific  problem  may  perforn^  better  than  the  suboptimal  Page  test.  Despite 
these  concerns  about  Page's  test,  the  test  will  be  shovvm  to  detect  the  change 
instants  occurring  at  random  times  very  efficiently.  This  chapter  focuses  on 
general  implementation  of  Page's  test  for  both  parametric  and  non-parametric 
detection,  and  evaluation  of  the  test's  performance  for  the  implemented 
nonlinearities,  by  using  the  results  in  Chapter  n. 
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Section  C  introduces  the  case  of  detecting  jumps  in  the  mean  of  Gaussian 
distributed  observations.  Both  upward  and  downward  directions  are 
considered  as  well  as  the  case  when  the  change  magnitude  is  unknown. 
Page's  test  implemented  for  this  problem,  (derived  from  the  on-line  point  of 
view)  and  the  GLR  test  (derived  from  the  off-line  point  of  view)  are  shown  to 
be  the  same. 

In  Section  D,  performance  evaluation  for  Page's  test  is  evaluated  for 
different  nonlinearities  in  Gaussian  and  Gauss-Gauss  mixture  noise 
environments.  In  particular  we  are  interested  in  the  cases  where  the  lower 
asymptotic  performance  bound  rj  is  tight  (i.e.,  tj  =  77).  In  the  parametric 
framework,  the  problem  of  detecting  changes  in  the  mean  and  variance  of 
Gaussian  observations  is  shown  to  result  in  77  =  77  for  which  the  performance 
measure  is  easily  computed.  As  a  second  example  we  consider  a  suboptimal 
implementation  of  Page's  test  where  the  distribution  after  the  disorder  is  not 
known  and  by  the  use  of  composite  hypothesis  technique,  a  new  test  is 
derived.  This  local  optimum  detector  is  based  on  Wolcin's  test  (Wolcin,  1983) 
and  a  modification  (Broder,  1990),  and  is  modified  to  detect  energy  changes 
occurring  within  frequency  "windows."  New  performance  results  are 
obtained  and  shown  to  be  consistent  with  the  simulation  results.  Finallv,  in 
the  nonparametric  framework,  the  sign  test  is  analyzed  by  using  results  from 
random  walk  theory,  and  shown  to  have  the  property  77  =  77 . 

Section  E  summarizes  the  main  results  of  this  chapter. 
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C  DETECTING  JUMPS  IN  THE  MEAN 

We  begin  in  this  section  with  the  simplest  application  of  the  Page  test, 
namely  the  problem  of  a  change  in  the  mean  of  independent  identically 
distributed  Gaussian  random  variables.  This  problem  is  an  important  one 
since,  as  will  be  shown  in  the  sequel,  many  complicated  problems  involving 
abrupt  changes  in  the  eigenstructure  (parameter  changes)  can  be  converted  to 
the  problem  of  change  in  the  mean.  Two  cases  are  considered:  the  first,  when 
the  means  before  and  after  the  change  are  known,  and  secondly  when  the 
means  and  therefore  the  change  magnitude  is  unknown. 


1.  Known  Means  before  and  after  the  Change 

Let  {e„}  be  a  Gaussian  white  noise  sequence  with  variance  d^,  and  let 
\x„)  be  the  observation  sequence  such  that 

x„  =  +  e„  n  =  1,  2, ...,  N 

where: 


fM)  if  n  <  v-1 

- 

if  n  >  V. 


Consider  now  the  likelihood  ratio  test  between  the  "no  change"  hypothesis: 


Hq:  V  >  N 


versus  the  "change"  hypothesis: 

Hi:  v<N. 


Thus,  the  log-likelihood  ratio  between  these  two  hypotheses  has  the 
following  form  (Basseville,  1988): 
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(3-1) 


therefore,  its  logarithm  is  given  by: 


lnL(v)  =  a^ifx*- 


2 


where  A  =  is  the  magnitude  of  the  jump,  and 


it=A 


2; 


(3-2) 


Replacing  the  unknown  jump  time  v  by  its  maximum  likelihood  estimate 
under  H\  yields; 


V  =  argmin'^ 
i<v^iV 


k=0  k=v 


Thus,  we  get  the  following  change  detector: 


aigmiT\Sy{^iQ,A). 
1<  v<N 


Hi 

gf^§L{v)  =  maxSl'{^o,A)  J  a  (3-3) 

"  Ho 


where  a  is  a  threshold  properly  chosen  as  addressed  in  Section  C. 

This  detector  can  be  described  also  as  follows:  detection  occurs  the 
first  time  at  which 

8n  =S]{^o,A)-  min  sf(Aio/4)>fl  (3-4) 

which  is  nothing  but  the  Page-Hinkley  stopping  rule  or  cumsum  algorithm, 
and  may  be  computed  in  the  following  recursive  manner: 


r  ^ 

8n  ~  8n-\'^^n  MO  ~ 

V  ^ 


(3-5) 


Thus,  both  Page's  stopping  rule  (derived  from  the  on-line  viewpoint)  and  the 
generalized  likelihood  ratio  (GLR)  test  (derived  from  the  off-line  viewpoint) 
are  identical.  The  behavior  of  the  Page-Hinkley  stopping  rule  is  depicted  in 
Figure  3.2. 

2.  Unknown  Magnitude  of  Change 

In  this  case  we  may  assume  that  {Aq  is  knowm,  but  is  not.  A 
minimum  jump  magnitude  Zimin  to  be  detected  is  fixed  a  priori.  Two  tests  are 
running  in  parallel  corresponding  to  two  possible  directions  (increasing  or 
decreasing  mean). 

For  detecting  a  decrease  in  the  mean  we  determine  the  stopping  time 
A’  by  observing  when  the  maxima  process  drops  down  by  a,  the  detection 
threshold  (see  Figure  3.2). 

N  =  inf  jn;  max  St  -  S„  >  fl 

where  (3-6) 

S^,  =  0. 

Similarly  for  detecting  an  increase  in  the  mean  w'e  define 

N  =  infi^^:  minSi.-S„>fl 

I  l<jt<n  ^ 

where  (3-7) 

k=V  ^  7 

So  =  0 


91 


a 


Figure  3.2.  Page-Hinkley  Stopping  Rule  as  the  Process  of 
Global  Minima  (for  a)  and  Global  Maxima  (for  b) 
a.  Detecting  Upward  Change 
h.  Detecting  Downward  Change 

The  change  time  v  is  estimated  to  be  the  last  maximum  time  before 
detection.  Similarly,  the  change  time  v  is  estimated  to  be  the  last  minimum 
before  the  stopping  time.  Notice  that  this  test  corresponds  to  a  linear 
transformation  g(x)  =  x  as  described  in  Chapter  II  with  bias  terms:  ± 

Figure  3.3  illustrates  the  recursive  version  of  Page's  test  when  zimin,  the 
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unknown  change  magnitude  is  set  to  be  0.2,  the  change  time  is  at  100  and  the 
SNR  of  the  input  signal  is  -3dB. 

D.  PERFORMANCE  EVALUATION  OF  CUMSUM  PROCEDURES 

As  shown  in  Chapter  H,  we  characterize  the  performance  by  the  mean  time 
between  false  alarms  T,  and  the  mean  delay  for  detection  D.  The  asymptotic 
ratio  between  log  T  and  D  was  shown  to  be  defined  by  (2-53)  and  (2-55): 

n  =  ^  =  11  (3-8) 

a— *oo 

where  a  is  the  threshold  of  the  test.  This  relationship  is  influenced  by  two 
factors:  The  first  is  the  transformation  or  nonlinearity  g{x).  The  second  is  the 
statistical  propjerties  of  the  observation  before  the  change  (the  root  hWo)  is  a 
function  of  the  SNR)  and  after  the  change  (E{g(x)l  ^}). 

In  the  sequel,  several  nonlinearities  are  presented  and  analyzed  in 
different  noise  environments.  Special  attention  is  given  to  these  situations 
which  result  in  equality  in  (3-8),  namely: 

fl— 

resulting  in  an  easy  way  to  calculate  the  asymptotic  performance  of  the 
detector. 

Notice  that  for  the  cases  where  (3-8)  is  an  equality,  the  relationship 
(logD/D  enables  a  comparison  of  Lorden  bounds  (2-46),  (2-47)  and  Wald's 
bounds  (2-48),  (2-50)  for  the  pair  (T,D)  with  the  correct  performance  measure 
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Figure  3.3.  Delecting  a  Change  in  the  Mean  of  Gaussian  Observations  using 

Page's  Test 
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(3-10).  Both  bounds  are  based  on  the  root  of  the  moment  generating  function 
/i(flo)  and  the  statistics  after  the  disorder  E{g(x)  I  ^).  The  following  subsections 
present  some  examples  for  which  performance  curves  specified  by  calculating 
pairs  of  (T,D)  for  many  values  of  a,  the  stopping  boundary,  and  k,  the  bias 
term.  Thus,  the  use  of  the  approximating  equations  for  (T,D)  enables  us  to 
find  the  pair  {a,k)  for  a  given  performance  requirement  (T,D). 


1.  Parametric  Detection 

For  parametric  detection  schemes  it  is  assumed  that  the  general  form 
of  the  statistics  before  and  after  the  change  is  known.  If  the  parameters  after 
the  change  are  not  known,  composite  testing  techniques  could  be  used  as 
shown  in  the  sequel. 

To  illustrate  the  performance  curves,  we  consider  the  situations 
where  the  noise  distributions  before  and  after  the  change  are  both  Gaussian 

E(x)  =  ^  .  exp(-x^  /  2<t^) 


and  also  the  case  where  both  densities  dre  Gauss-Gauss  mixtures  (Kassam, 
1987); 


P(x)  =  (l-e) 


■e  ^ 


^[2nof 


-x^/2a^ 


(3-9) 


2  2 

with  variance  +  ea^. 


The  Gauss-Gauss  mixture  density  is  the  first  two  terms  in 
Middleton's  Class  A  model  where  the  noise  density  function  is  modeled  by  an 
infinite  weighted  sum  of  Gaussian  densities  with  decreasing  weights  and 
increasing  variances,  and  has  been  used  to  model  interfering  waveforms 
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(pulses)  and  narrowband  noise.  The  parameter  e  indicates  the  amount  of 

contamination  and  is  typically  in  the  range  (0,0.25).  For  small  enough  values 

of  £,  the  behavior  of  Pix)  near  the  origin  is  dominated  by  that  of  For  large 
2 

values  of  \x\,  Oq.  dominates  the  behavior  of  P(x)  since  its  tails  decay  at  a 

2 

slower  rate  than  do  those  of  Oq.  Thus,  the  relative  strength  of  the 
contamination  is  given  by  the  power  ratio  y^-cr|  /  Oq-  Adjusting  the 
parameters  (£,y)  we  can  determine  the  performance  of  the  cumsum 
procedures  for  a  wide  range  of  distributions  including  those  with  heavy  tails. 

A  second  disorder  situation  results  in  the  assumption  that  before  the 
disorder  Poix)  is  Gaussian  while  after  the  disorder  Pi(;r)  is  a  Gauss-Gauss 
mixture.  We  consider  the  linear  detector  g{x)  =  x,  and  the  nonlinear  log- 
likelihood  detector  and  the  local  optimal  energy  detector  gix)  =  x-l . 


a.  Detecting  Disorder  in  Gaussian  Measurements 

If  gix)  is  the  log-likelihood  nonlinearity,  then  it  has  been  shown 
in  Chapter  II  that  in  the  limiting  situation  the  bound  is  tight,  i.e.,  =  rj  =  77 
where 


V  = 


T  D 


where  I(6j,6o)  is  the  Kullback-Liebler  number  defined  in  (2-23).  In  the  case  of 
a  change  in  the  mean,  i.e.,  Xj  ~  Ni^Q,<j^)  for  i  <  v,  X,  ~  Nin\,o^)  for  i  >  v,  the 
Kullback-Liebler  information  number  is  given  by  (Therrien,  1989) 

T]  =  lim  (logT  /  D) 

T  -^00 

=  /2a^  (3-10) 


96 


with 


An  =  nl-^^o■ 

Thus  the  result  can  be  directly  related  to  the  signal  to  noise  ratio  An! o. 
Notice  that  this  result  is  consistent  with  the  result  obtained  in  (3-2)  for 
detecting  jumps  in  the  mean  of  i.i.d.  Gaussian  observations  using  the 
nonlinearity 

which  results  from  the  log-likelihood  ratio  test.  For  this  nonlinearity: 

t;  =  E{g(jr)l0i} 

=  {Anf 

In  the  case  of  detecting  a  change  in  the  variance  of  zero  mean 
Gaussian  i.i.d.  observations,  the  log-likelihood  ratio  results  in  a  square  law 
type  detector  and  is  given  by 

g(r)  =  cx^  +  In  y 

where 

1  (T?-o^ 

2  ofoo  '  CTi 

Notice  that  for  detecting  an  upward  change  (y<  1),  c  is  positive  and  Inyis 
negative,  while  for  detecting  a  downward  change  (y>  1),  c  is  negative  and  Iny 
is  positive.  This  explains  the  behavior  of  the  Page  test  as  illustrated  in  Figure 
3.5.  Thus,  the  performance  measure  is  given  by 
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(3-11) 


=  caf  +  In  y 
=  -^[y'^-i]+iny 

which  is  as  expected  the  Kullback-Liebler  information  number  for  this  case. 

The  closed  form  in  which  the  asymptotic  performance  measure  is 
given  allows  one  to  compute  easily  the  performance  curves. 

Figure  3.4  illustrates  the  performance  curves  when  detecting  a 
disorder  in  the  mean  of  Gaussian  measurement  (as  illustrated  by  Figure  3.3) 
using  the  optimal  nonlinearity  g(x)  {x-^~  A^/2)  for  different  signal  to 

noise  ratios  (Equation  3-10).  The  predicted  results  obtained  for  the  delay  as  a 
function  of  a  given  SNR  agrees  with  the  simulation  results  shown  in  Figure 
3.3  within  a  tolerance  of  up  to  10  samples. 

Figure  3.5  illustrates  a  changing  variance  Gaussian  signal  with 
7=  1.2  (downward  change),  and  change  time  at  150.  Also,  the  optimal  Page 
test  using  the  square  law  nonlinearity  gix)  =  cx^  +  Iny  applied  to  this  signal  is 
shown.  Notice  that  in  this  case  of  y>  1,  F{gix)  I  flo)  <  0  while  E{g(a:)  I  0i)  >  0  as 
needed. 

Figure  3.6  illustrates  the  performance  curves  for  the  square  law 
detector  nonlinearity  (3-11).  Notice  that  when  oi  -^Ob  which  means  that  the 
changes  become  undetectable,  the  bound  given  by  Equation  (3-11)  turn  to  be 
noninformative  since  tj  0.  Thus,  the  delays  obtained  for  values  of  y 
approaching  1  are  higher  than  those  obtained  for  values  of  y  which  are 
distant  from  1.  Notice  also  the  bell  curve  shape  of  the  ARL  function  for  this 
detection  scheme  (which  is  consistent  with  the  example  shown  in  Figure  2.9). 
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Figure  3.4.  Performance  Curves  for  Page's  Test  Implemented  with  the  Linear 

Aa 

Detector  gix)  =-^x-Ho-Anl2) 
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Magnitude 


Figure  3.5.  Changing  Variance  Gaussian  Signal  and  the  Corresponding  Page 
Test  Implemented  with  the  Square  Law  Nonlinearity 


100 


Figure  3.6.  Performance  Curve  Obtained  for  Page's  Test  Implemented  with 
the  Square  Law  Detector  g(x)  =  cx^+lny 
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b.  Locally  Optimum  Energy  Detector  for  Spectral  Signatures 

Consider  the  case  where  we  observe  the  energy  spectral  density  of 
a  signal.  Under  the  "no  change"  hypothesis  we  assume  without  loss  of 
generality  that  the  background  noise  process  {y}  is  normalized  (i.e.  <7=  1) 
White  Gaussian  Noise  (WGN)  and  is  grouped  in  disjoint  blocks  of  M  points 
for  processing  via  the  Discrete  Fourier  Transform  (DFT).  Hereby  we  assume 
that  the  sample  blocks  are  mutually  independent.  The  squared  magnitudes  of 
the  M  complex  outputs  of  the  DFT  are  computed  and  these  random  outputs 
denoted  by  i  =  1, 2, ...  ,  m  =  1,  2, ...,  M  where  i  is  the  block  number  and 

m  is  the  frequency  bin  number,  form  the  Periodogram  and  are  available  as 
the  observations  for  the  detection  procedures.  Namely, 

{x,,„}  =  |DFr(y,,„}f 

where 

yz,m  =  y('^  +  ^)  i  =  l/2,...  m  =  l,2,...,M. 

Hereby,  we  are  interested  in  detecting  a  change  within  a  specific  frequency  bin, 
while  the  method  described  here  can  be  also  used  to  detect  a  change  from 
block  to  block  as  was  done  by  Broder  (Broder,  1990),  thus,  our  method 
modifies  Wolcin's  method  (Wolcin,  1983)  by  looking  for  a  change  in  an 
orthogonal  direction  (frequency)  to  the  direction  (block)  used  by  Wolcin's. 
Moreover,  we  use  a  narrow  "window"  of  frequencies  to  detect  changes  within 
several  frequency  bins  in  order  to  detect  a  certain  spectral  signature. 

Under  the  white  Gaussian  noise  assumption,  the  variables  {X,,m) 
except  the  first  one  m  =  1  the  first  frequency  bin,  are  independent  and 


identically  distributed  with  exponential  distribution  and  unity  mean,  having 
the  x\  distribution  (Kay,  1988) 

Under  the  change  hypothesis,  the  distribution  of  {X,,m)  containing  the  signal 

in  addition  to  the  WGN,  will  also  be  presumed  to  be  exponential  but  now 

with  mean  >  1.  This  is  due  to  the  fact  that  under  the  change  hypothesis 

Xim  has  a  noncentral  x\  distribution  with  a  noncentral  parameter  (Whalen, 

2 

1971)  A  >  0,  thus  the  mean  ^  of  the  non  central  Xi  distribution  is  given  by 

H  =  A+1  >  1. 

Thus,  if  we  assume  that  after  the  disorder  ni  m  does  not  depend  on  /,  we  have 

/ Mm)- 

This  is  the  case  when  the  signal  itself  is  also  a  Gaussian  signal  which  is 
independent  of  the  background  WGN.  Hence,  the  original  hypothesis  testing 
of 

Ho:  {y,}  -  WGN 
versus 

^1-  (yi)  ~  Gaussian  signal  +  WGN 

in  the  signal  domain,  is  equivalent  to  the  hypothesis  testing  in  the  spectral 
domain. 
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1  =  1,2,...,  m  =  l,...,M 


^0’  ~ 

versus  (3 ‘^2) 

Wi:  Xi„~n;;^exp{-Xi„/n^]  1  =  1,2,...,  m  =  l,...,M 

Because  the  parameters  are  not  known  a  priori,  Page's  test  with  the 
optimal  nonlinearity  the  log-likelihood  ratio  cannot  be  implemented.  Thus, 
we  will  use  composite  hypothesis  techniques  such  as  the  Locally  Most 
Powerful  (LMP)  test  statistic.  In  Chapter  n  we  introduced  the  local  optimum 
nonlinearity. 

gio{x)  =  -^P{x;e)/  P{x-,e) 

-  ^  ^^P{x;6  +  Ad) 

~  de^  P{x:  e) 

Ae  =  Q. 

where  Pix;6)  denotes  the  observations  density  conditioned  on  the  parameter 
6.  This  test  measures  small  deviations  from  the  "null"  hypothesis,  hence,  as 
was  shown  in  Chapter  II,  it  maximizes  the  efficacy  (incremental  signal  to 
noise  ratio)  of  the  test. 

Using  this  function  for  testing  between  the  hypotheses  /r  =  1  and 
>  1  for  the  case  of  univariate  exponential  distributions  yields  the  following 
nonlinearity 
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=Urn_log;^^^ 

ni\dii  exp{-x} 

=  Um^(-log/i-x//i  +  x) 

/iil  dfi 

-  lim(x//i^  -/i"M 
/iir  > 

=  x-l. 

Thus,  g/o(jf)  does  not  depend  on  /i  after  the  change,  this  results  in  a  locally 
most  powerful  test  for  all  /i  >  1 . 

Implementing  Page's  test  for  bin  number  m  yields 

Si,m  =  max{0,S,_i^^  +  g(x,'^^)} 


where  g(x,,„)=  X,-.„ -1-t  (3-13) 

where  is  a  positive  parameter  or  reference  value  needed  to  bias  the  test  for 
the  null  hypothesis,  such  that  E{g(x,>,)  =  1}  <  0,  since  Page's  test  performs 

better  when  the  mean  of  the  nonlinearity  before  the  change  takes  place  is 
negative  as  opposed  to  zero. 

At  this  point  it  is  important  to  notice  a  robustness  property  of  this 
detector.  Since  the  method  is  based  upon  detecting  changes  in  the  energy 
(periodogram),  and  since  it  is  assumed  that  the  disorder  is  independent  of  the 
background  noise,  the  presence  of  the  signal  with  a  certain  frequency 
component  will  increase  the  total  energy  in  the  corresponding  frequency  bin 
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which  is  to  be  detected.  Hence,  the  underlying  signal  model  should  not 
assume  a  specific  model  for  the  signal. 

The  performance  of  Page's  test  (3-13)  is  determined  by  the 
parameters  a  and  k.  Hence,  with  two  degrees  of  freedom  the  test  results  in 
many  pairs  {a,k)  that  yield  the  same  performance.  The  problem  is  to  find  a 
specific  pair  which  results  in  a  high  detection  probability.  In  order  to 
determine  the  performance  in  this  situation,  notice  that  since  the  parameter 
/i  is  not  known  a  priori,  the  performance  measure  t]  cannot  be  determined 
since  E{g{x)  1  0i}  is  not  explicitly  known.  Thus,  we  shall  use  Lorden's  bounds 
(2-46),  (2-47)  and  Wald's  bounds  (2-48),  (2-50)  to  obtain  informative  bounds 
for  the  false  alarm  rate.  In  order  to  obtain  these  bounds  it  is  necessary  to  find 
the  root  h  of  the  moment  generating  function  identity  (2-11)  before  the 
disorder  (Broder,  1990). 

1  =  E{exp{)j-5(Xj,„)}|^^  =  l}  m  =  1,...,M. 

=  Elexpl.^i  •  [Xi  - 1  -  =  l} 

=  exp{-)i(l  +  k)]  •  E[exp[hXi„,}\^^  =  l} 

rt  exp{-//(l  +  ^)}  /  (1  -  h).  (3-14) 

The  root  is  shown  to  be  a  function  of  the  bias  term  k.  Figure  3.7 
illustrates  tius  relationship.  Notice  that  the  root  h  does  not  depend  on  the 
DFT  length.  This  fact  will  be  shown  to  be  the  key  to  the  surprising 
observation  shown  in  the  sequel  that  the  SNR  per  bin  does  not  depend  on  the 
DFT  length. 
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Figxure  3.7.  The  Root  of  the  Moment  Generating  Function  Identity  (2-11)  for 
gix)  =  x-l~k  as  a  Function  of  the  Bias  Term  k 


Notice  that  the  root  is  upper  bounded  by  <  1.  Recall  that  by 
(2-46)  T  >  exp{/i(flo)  fl}.  Thus,  large  values  of  h  are  desired.  Recall  also  that  77  is 
lower  bounded  by  77 .  Consequently,  for  a  given  false  alarm  rate,  a  larger  77 
corresponds  to  a  smaller  delay.  Thus,  from  (2-55)  it  is  clear  that  larger  values 
of  h  are  desired,  which  means  that  biasing  the  test  with  larger  values  is 
favorable. 

In  order  to  improve  the  poor  statistical  properties  of  the 
periodogram  (standard  deviation  of  the  order  of  the  mean),  a  window  of 
length  W  =  3  that  groups  the  expected  frequency  bin  and  the  two  neighboring 
frequency  bins  was  taken.  Thus  the  statistic  function  g{x)  was  modified  to 


where  rttf,  nii+i,  m/+2/  are  the  frequency  bins  used  by  the  window.  A  typical 
time /frequency  sample  grid  is  shown  in  Figure  3.8. 


Figure  3.8.  Time/Frequency  Sample  Grid 

Notice  that  in  this  case  the  root  location  depends  on  the  window 
length  since  for  that  case  the  moment  generating  function  has  the  form 


■  exp- 

1 

=  1  ■ 

• 

m=mi 

■ 

f"V±2  fL. 

1, 

n 

exp  t(X, fc) 

[m=m/ 

) 

=  E{exp{|(Xj,„-l-%..„  =  l||j 
Figure  3.9  illustrates  the  root  location  for  the  given  window  W  =  3. 


Moment  Generating  Function,  Implementing  Eq.(3-15) 


Figure  3.9.  Root  of  Moment  Generating  Function  for 

^  m=mi 
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Notice  that  the  root  is  upper  bounded  by  3.  This  implies  that  the  mean  time 
between  fal^e  alarms  T  will  be  larger  in  this  case  than  the  previous  one,  since 
with  the  same  bias  level  a  higher  root  value  is  obtained.  However,  if  the 
averaging  of  the  window  frequencies  were  done  by  the  function 

mi+2 

m=mi 

the  root  location  would  be  the  same  as  in  (3-14),  i.e.,  upper  boimded  by  1.  This 
may  imply  that  the  averaging  method  (3-15)  performs  better  than  the  others. 

The  problem  is  that  we  would  like  to  determine  the  performance 
with  a  given  pair  of  (fl,k),  but  since  the  function  g{x)  was  based  upon  a 
suboptimal  hypothesis  test,  only  bounds  (Lorden  and  Wald)  can  be  derived. 
To  resolve  this  problem  the  following  method  is  presented. 

Consider  that  we  are  given  the  desired  mean  time  between  false 
alarm  T  and  some  minimum  value  for  //,  say  )Uniin(>l)  cf  for  which  we 
can  test.  In  this  situation.  Page's  test  using  the  optimal  nonlinearity,  the  Log- 
Likelihood  Ratio  (LLR)  can  be  implemented.  This  results  in 

^(x)  =  (l-A)x  +  logA  (3-18) 

where  A  = 

and  Page's  test  (3-13)  is  implemented  with  the  function  (3-15)  and  a  new 
threshold  a'.  In  order  to  find  the  relationship  between  the  pairs  {a,k)  and 
ia',k)  of  the  tests  (3-13)  and  (3-15)  we  will  use  the  following  analysis: 
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Nfo  =  S,-  >  a|,  S,-  implemented  with  (3-13) 

Nllr  =  infji:  Si  >  fl'|,  S,-  implemented  with  (3  - 15) 

Hence,  we  obtain 

Nllr  =  inf  {'■  S,-,  +  X,- „  +  log[A„  /  (l  -  )]  2  a7(l  -  )} 

N,<,  =  inf{i:  Sm+X,>-1-*2  4  (3-19) 

To  achieve  the  same  performance  requires  that  the  following  relationships 
will  sustain 

a  =  a'll-Xff, 

»:  =  log[A„/(l-A„)]-l.  (3-20) 

Notice  now  that  for  the  log-likelihood  ratio  function  hido)  =  1-  Thus,  for  the 
given  average  time  between  false  alarms,  T,  equation  (2-46)  becomes 

T  S  exp  a'. 

Hence,  the  following  procedure  can  be  implemented: 

•  given  7,  the  threshold  a'  which  guarantees  that  requirement  is  given 
b^' 

a'  =  InT,  (3-21) 

•  use  (3-20)  to  find  both  the  threshold  a  and  the  bias  k  needed  for 
implementing  the  local  optimum  test  given  7  and  Hm-  Hence,  this  test 
is  now  “tuned"  for  the  desired  performance. 

To  summarize,  this  procedure  allows  the  use  of  optimal  nonlinearity  in  order 
to  find  the  specific  pair  ia,k)  needed  to  achieve  the  performance  requirement 
for  the  energy  detector. 
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A  second  and  even  more  practical  way  to  determine  the  test 
parameters  {a,k)  is  by  using  the  SNR  per  bin  which  is  required  to  meet  the 
performance  requirements.  Hereby,  the  notation  S  relates  to  the  signal  and  N 
to  the  noise  (in  the  spectrum  domain).  Decomposition  of  the  data  yields 
(provided  that  energy  exists  only  in  one  of  the  frequency  bins) 

1  "'i+i 

S  +  N-.  E{«Mei}  =  r  s  (E{x.,4-l-»r) 

^  m=mi 

3 

=  T)/h{k). 


Thus, 


SNR=/x„-l 


(3-22) 


Notice  that  k  and  Hk)  were  determined  to  achieve  a  given  lower  bound  for  T, 
thus,  T]  given  by  (3-8)  determines  the  asymptotic  ratio  for  the  desired  pair 
(r,D).  Hence,  using  equation  (3-22)  embies  us  to  find  the  corresponding  SNR 
per  bin  which  is  required  to  achieve  the  desired  performance.  Figure  3.10 
shows  the  SNR  required  per  bin  as  a  function  of  the  bias  term  k  for  different 
values  of  the  asymptotic  measure  t]  .  Notice  that  each  given  k  corresponds  to 
a  certain  T,  thus,  the  corresponding  delay  value,  D,  is  found  from  the  graph  by 
using  the  assigned  r)  needed  for  certain  SNR. 

Analyzing  (3-22)  reveals  an  important  result.  Larger  vailues  for  n 
correspond  to  a  smaller  delay,  D,  in  detecting  a  disorder.  Thus,  larger  values 
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F’^ure  3.10.  SNR  per  Bin  for  Energy  Detector  as  a  Function  of  the  Bias  k 
Implementing  Nonlinearity  (3-15) 
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of  k  are  needed.  If  the  function  (3-13)  had  to  be  used,  the  SNR  per  frequency 
bin  would  remain  the  same.  Thus,  using  (3-15)  does  not  improve  the 
minimal  SNR  required  per  bin  to  achieve  some  level  of  detection  probability, 
but  improves  the  overall  performance  by  having  a  lower  false  alarm  rate. 
However  if  we  implement  nonlinearity  (3-17),  decomposition  of  the  signal 
and  noise  yields  (provided  that  energy  exists  only  in  one  of  the  frequency 
bins) 

E{x{x,>)|e,}  =  I  (e{x..„}-i-i) 

m=m/ 

=  ^-l-3k 

n 

'h(k) 


hence,  the  minimal  SNR  per  frequency  bin  is  given  by 

SNR  =  ^-1 


_n_ 

h{k) 


+  3k. 


(3-23) 


Figure  3.11  illustrates  the  SNR  function  as  a  function  of  the  bias  k  for  the 
nonlinearity  (3.17).  Hence,  there  is  an  SNR  improvement  of  the  order  of 
l-3dB.  This  is  a  surprising  result  because  one  would  expect  that  since  the  root 
for  (3-17)  is  upper  bounded  by  one  as  opposed  to  the  root  of  (3^  15)  which  is 
upper  bounded  by  3,  the  overall  performance  of  (3-15)  will  be  better.  Thus,  a 
tradeoff  between  the  delay  and  the  minimal  SNR  required  for  detection  is 
determined  by  the  bias  k.  Moreover,  analyzing  (3-23)  reveals  an  important 
result.  Larger  values  for  tj  correspond  to  lower  delay  and  better  detector 
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Figure  3.11.  SNR  per  Bin  as  a  Function  of  the  Bias  k 
Implementing  Nonlinearity  (3-17) 


performance,  thus,  larger  values  of  k  are  needed.  But  this  is  opposed  to 
having  lower  values  of  k  which  are  needed  to  obtain  the  required  SNR  per 
bin.  Hence,  the  chosen  bias  term  k  should  reflect  a  tradeoff  between  these  two 
conflicting  requirements. 

Simulation  results  were  done  by  using  the  function  (3-17)  data 
records  of  length  4000  samples  where  the  change  point  was  at  sample  2000 
(i.e.,  nuddle  of  the  record).  The  Nyquist  frequency  used  was  500Hz,  and  at  the 
change  point  the  transition  was  from  62Hz  to  156Hz.  We  used  two 
algorithms,  one  of  which  uses  a  32-point  DFT  producing  a  time /frequency 
grid  of  (125x32)  points  and  the  other  uses  a  128-point  DFT  producing  a 
time /frequency  grid  of  (30x128)  points,  where  the  corresponding  processing 
gains  are  12dB  and  18dB  respectively.  Hence,  using  Figure  3.10  allows  one  to 
predict  the  detection  performance.  An  incoming  signal  with  input  SNR  of 
-3dB  cannot  be  detected  by  using  a  32-point  DFT  since  the  output  SNR  is  9dB, 
which  is  below  the  minimum  SNR  per  bin  required  for  detection.  By  using  a 
128-point  DFT,  the  output  SNR  is  15dB,  which  is  about  3dB  above  the 
minimal  SNR  required  for  detection.  The  same  analysis  done  by  using 
signals  with  input  SNR  of  -6dB  reveals  that  the  32-point  DFT  cannot  detect 
the  changes,  while  a  128-point  DFT  copes  with  the  detection  successfully. 
Figure  3.12  illustrates  the  time/frequency  grid  for  the  case  of  using  a  32-point 
DFT  with  input  SNR  of  -6dB,  while  Figure  3.13  illustrates  Page's  test 
implemented  on  bins  19,  20,  21  (bin  20  Deing  the  156Hz  bin)  by  using  a 
128-point  DFT  to  detect  energy  at  156Hz  with  input  SNR  of  -3dB  and  -6dB 
respectively.  Similarly,  Figure  3.14  illustrates  Page's  test  implemented  on  bins 
4,  5,  6  (bin  5  being  the  156Hz  bin)  by  using  a  32-point  DFT  with  input  SNR  of 
-3dB  and  -6dB  respectively. 
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Figure  3.12.  Typical  Time/Frequ».r.vy  CAd  of  (30:^^123)  Toints. 
128  Point  DFT,  Input  SNR  =  -6dB 


117 


Amplitude 


page  test,  32  points  dft 


pace  test,  32  points  dft 
500 - ^ ^ - : - r 


Figure  3.14.  Page's  Test  Implemented  on  Bins  4,  5, 6  of  a  32-Point  DFT 
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In  order  to  compare  the  detection  performance  of  the  Page  test 
with  a  conventional  detection  scheme  we  refer  to  Whalen  (Whalen,  1971)  in 
which  the  performance  (Receivei  Operating  Characteristic — ROC)  for 
detecting  M  independent  sinewave  samples  in  white  Gaussian  noise  by  using 
a  linear  detector  (which  is  the  locally  optimum  detector  for  Gaussian  signals, 
see  Kassum,  1988),  is  analyzed.  Even  though  the  detection  is  not  based  on 
energy,  the  comparison  presented  in  the  sequel  indicates  better  performance 
of  our  method.  Figure  3.13  illustrates  the  Page  detector  implemented  on  a 
128-point  DFT.  For  an  incoming  signal  with  SNR  of  -6dB  the  delay  for 
detection  is  4  blocks  and  the  minimum  SNR  required  for  detection  (using  the 
proper  bias  value  to  minimize  the  SNR)  is  about  12dB  for  rj=10  and  about 
6dB-8dB  for  tj=1.  The  corresponding  bounds  for  the  false  alarm  rate  are  lO-^o 
and  10"^  respectively.  Figure  3.15  illustrates  the  ROC  for  a  linear  detector  for 
detecting  four  independent  samples  (equivalent  to  delay  in  detection  of  four 
blocks)  of  a  sinewave  in  white  Gaussian  noise  (Whalen,  1971,  p.  250)  where 
the  parameter  is  the  SNR  required  for  detection.  For  this  classical  detection 
scheme  the  ROC  is  in  terms  of  Pfa  versus  Pp.  Thus,  to  compare  the 
performance  of  these  two  methods  we  refer  only  to  values  of  Pp-^l  to  reflect 
that  the  detection  is  almost  surely  certain.  Figure  3.15  illustrates  that  for 
values  of  6dB-8dB  the  performance  of  the  linear  detector  is  very  poor  since 
the  PpA  is  in  the  order  of  Kh^-lO"^  respectively,  while  for  the  Page  test  it  is  at 
least  lO"'*.  Furthermore,  for  the  linear  detector  as  the  Pfa  is  lowered,  at  a 
given  (fixed)  SNR  the  Pp  decreases,  while  for  the  Page  detector,  equivalently 
lower  Pfa  (corresponding  to  higher  mean  time  between  false  alarms)  requires 
a  higher  threshold  and  reflects  in  a  higher  delay  but  still,  the  detection  is 
guaranteed.  In  the  operating  ranges  of  above  9dB  (which  is  the  typical 
operating  range  for  this  type  of  detection)  the  Page  test  is  shown  to  have  better 
performance  than  the  conventional  linear  detection. 
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Figiire  3.15.  ROC  for  Detecting  Sinewaves  in  White  Gaussian  Noise  (four 
samples  averaged).  From  Whalen,  1971. 
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2.  Non-Parametric  Detection 

Hereby  we  will  consider  only  the  sign  detector  defined  as 

fl  forx>0 

= 

[-1  for  X  <  0 

This  nonlinearity  is  sometimes  also  referred  to  as  random  walk  nonlinearity 
since  the  output  gix)  is  a  random  walk.  Thus,  results  from  random  walk 
theory  can  be  used.  Define; 

p(e)  =  Pr{xW  =  ll0} 

,(e)  =  Pr{g(x)  =  -ll«}. 

If  pid)  ^q(6)  there  is  a  positive  probability  that  the  process  will  drift  to  +«»  if 
p(0)  >  q(6)  (and  to  -<»  if  p{6)  <  q(6)).  Thus  assuming  that  E{^(Ar)  I  flo)  <  0  yields 
p(0o)  <  qi%)  while  assunaing  E{g(a:)  I  0i}  >  0  results  in  pi&i)  >  q(0])- 
The  moment  generating  identity  is  given  by 

E{exp{;i(0o)-^W}|0o}  =  P(%)exp{M%)(+l)}  +  ‘?(%)exp{/i(eo)(-l)} 

=  1. 


Consider  h(6o)  =  In 


q(eo) 

p(6o)' 


thus. 


expJ  In 


?(^o) 

pM 


=  pW- 


<?(^o) 

pM 


+  q{0o)- 


pM 


=  p(6>o)  +  <?(0o)  =  l. 


Hence, 
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h{dQ)  =  \n 


pM 


and 


EfeW«i}  =  p(»i)-?(ei)>o. 


The  result  is  that  the  lower  bound  on  the  performance  measures  is  given  by 

V  =  lpW-‘}W]log^^. 


In  order  to  evaluate  the  performance  measure  tj,  we  will  use  results  from 
random  walk  theory  (see  Karlin  and  Taylor,  1984,  p.  109)  for  the 
approximation  of  the  ARL  function  of  Page's  test  as  done  by  Broder  (Broder, 
1990). 


ARL(0)  = 


<i(e)-p(e) 


iJmT 

[p(^). 


-a 


ifp(e)»i(e).  (3-24) 


Since  under  (j(do)  >  pido),  the  average  time  between  false  alarms  for  large  a 
can  be  approximated  as 
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q{eo)-p{eo)  p{eo) 

qM 


=  qM 


'q(Oo)T 
_ .p(^o)J 


qM 

pM 


Under  6i,  p(Oi)  >  hence,  the  average  delay  for  large  a  is  given  by 


^’^p(0j)-q(e:) 

hence,  the  performance  bound  t]  is  given  by  (Broder,  1990) 

Tj  =  lim  — ~ 
a—^oo  D 

iog[f^T 

a/[p{e^)-q{e^)] 

=  [p(e,)-,(e,)]iog^ 


Using  this  result  allows  the  comparison  of  Lorden  and  Wald  bounds 
with  the  approximated  results  from  random  walk  theory. 
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For  the  simulation  results  we  considered  the  symmetric  additive 


signal  in  noise  situation,  i.e. 

\Pix-e)  1  =  0 

p(iiei)  = 

[p(i+e)  1  =  1 

and  the  noise  environments  considered  were  Gaussian  and  a  Gauss-Gauss 
mixture.  In  order  to  calculate  the  p(6)  and  qid)  parameters  as  a  function  of  the 
signal  and  noise  parameters  consider  the  following 

Pr{x(x)  =  ±1}  =  PrjAT  J  O}. 

Thus,  by  knowing  the  mean  of  the  incoming  signal  we  can  use  the 
complementary  error  function  to  derive  both  the  Gauss  and  Gauss-Gauss 
mixture  cases,  as  shown  in  Figure  3.16.  For  the  Gauss-Gauss  mixture 

p(6>i)  =  (l-e)pi(ei)-h£pi(6l) 

q{e^)  =  i-p{e^) 

for  which  p{6\)  <  qid).  It  follows  from  the  symmetric  signal  assumption 
(A)  =  -Ml)  that 

p{eo)  =  q{di) 
q{eo)  =  p{di) 

which  results  in  the  desired  situation  of  qi$o)  >  p(6o). 
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Figure  3.16.  ;>(0)  as  a  Function  of  the  Signal  and  Noise  Parameters 
(Symmetric  Case,  Gauss-Gauss  Mixture) 


As  shown  in  the  previous  example,  the  root  of  the  moment 
generating  function  is  needed  for  Lorden's  and  Wald's  approximation.  Figure 
3.17  illustrates  the  root  position  for  different  pair  values  of  We  see  that 
as  p(0o)  <  0.5  becomes  larger,  the  root  is  smaller  which  indicates  that  for  a 
given  false  alarm  rate  the  delay  for  detection  will  be  larger  due  to  the  fact  that 
p(6o)  approaches  cfido),  resulting  in  a  difficult  decision  situation.  In  the 
neighborhood  where  q(6o)  is  slightly  larger  than  p(6o)  E{gix)  I  ^o)  =  0.  In  this 
situation,  biasing  the  test  is  needed  since  the  root  approaches  zero  and  the 
bound  Tj  is  not  informative  anymore. 


Figure  3.17.  The  Root  of  the  Moment  Generating  Function  for  the  Sign  Test 


Figure  3.18  illustrates  Lorden's,  Wald's,  and  the  random  walk 
approximation  (3-20)  as  functions  of  the  threshold  a  for  a  certain  case  where 
before  the  disorder  the  difference  between  p(6o)  and  <^(0o)  is  large  enough. 
The  results  indicate  good  detection  bounds.  Figure  3.19  illustrates  the  same 
approximations  but  now  when  cjido)  approaches  pOo),  the  degradation  in 
performance  is  shown  to  be  in  the  order  of  several  magnitudes.  The  values 
for  q(6o)  and  pi6o)  were  chosen  to  simulate  two  cases  of  Gauss-Gauss 
mixtures,  resulting  in  p(do)  =  0.15,  q{6o)  =  0.85  for  the  first  case,  and  p{do)  =  0.4, 
q($Q)  =  0.6  for  the  second  case. 
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Figure  3.18.  Sign  Test.  Mean  Time  between  False  Alarms  for 
p{6o)  =  0.15,  ijido)  =  0.85 
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Figure  3.19.  Sign  Test  Mean  time  between  False  Alarms  for 

piOo)  =  0.4,  qiOo)  =  0.6 


To  analyze  the  delay  for  detection  we  use  a  similar  technique,  but 
since  we  now  explore  the  situation  after  the  disorder,  we  consider  the  two 
corresponding  cases  where  p{6])  is  larger  than  cj{6i)  and  where  p{6\) 
approaches  The  results  are  similar  to  those  obtained  in  the  case  of  the 

false  alarm  rate  and  are  shown  in  Figures  3.20  and  3.21  in  the  form  of 
performance  curves  for  Page's  Test  implemented  with  the  sign  detector.  As 
in  the  previous  case,  p(6i)  and  qidi)  correspond  to  the  same  Gauss-Gauss 
mixture  parameters. 
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Sign  Detetor  Performance  Curve 
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Figure  3.20.  Performance  Curves  for  the  Sign  Detector 
p(0o)  =  0.15  qido)  =  0.85 
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E  SUMMARY 

In  this  chapter  we  have  described  the  problem  of  the  change  detection  and 
of  the  joint  estimation  of  the  change  time  and  the  model  parameters.  Within 
this  framework,  only  the  problem  of  the  quickest  detection  has  been 
investigated  by  using  Page's  test.  In  the  parametric  framework,  the  linear 
detector  and  the  square  law  detector  were  shown  to  be  optimal  in  the  sense  of 
quickest  detection  of  changes  in  the  mean  and  variance  of  Gaussian 
observations.  In  both  cases  performance  measures  were  derived  and  shown 
to  be  consistent  with  the  actual  results  of  simulations.  A  new  algorithm  for 
detecting  changes  in  the  spectral  energy  was  implemented  based  on  locally 
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optimal  testing  and  shown  to  be  consistent  with  the  analytical  performance 
results  obtained  for  this  test.  The  bias  of  the  test  was  shown  to  reflect  a 
tradeoff  between  the  detector  performance  and  the  minimal  SNR  required  for 
detection.  Finally,  the  issue  of  non-parametric  detection  was  investigated  by 
implementing  Page's  test  with  the  sign  nonlinearity  and  testing  the 
performance  under  Gauss  and  Gauss-Gauss  mixture  noise  distributions. 
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IV.  BROWNIAN  MOTION  APPROXIIvlATlON  TO  CUMSUM 

PROCEDURES 


A.  INTRODUCTION 

In  sequential  analysis  additional  simplification  results  from 
approximating  sums  of  independent  random  variables  in  discrete  time 

by  a  Brownian  molion  process  {B(f),  f  >  0}  in  continuous  time.  Moreover,  for 
cases  where  the  observations  do  not  form  a  Gaussian  process,  the  discrete 
time  process  can  be  approximated  by  a  Brownian  motion  process  which  is 
Gaussian.  For  further  discussion  on  this  subject  see  Reynolds  (Reynolds, 
1975). 

To  understand  the  motivation  of  the  use  of  the  Brownian  motion  process 
as  a  continuous  approximation  to  the  random  walk  (which  describes  the 
cumsum  procedures),  let  x^,  X2,  ...  be  independent  and  normally  distributed 
with  mean  n  and  unit  variance.  If  [Bit),  f  >  0)  is  a  Brownian  motion  with 
drift  n,  then  and  Bin),  n  =  0,  \,  ...  have  the  same  joint  distribution. 

The  analogy  is  clear:  Brownian  motion  is  an  interpolation  of  the  discrete 
time  random  walk  S„  which  preserves  the  Gaussian  distributions  to  the 
extent  that  a  random  walk  process  is  approximately  normally  distributed  for 
large  n.  Thus,  the  Brownian  motion  process  may  be  used  as  an  asymptotic 
approximation  to  a  large  class  of  random  walks  and  hence  of  log-likelihood 
ratios.  A  good  reference  for  a  detailed  discussion  of  this  point  is  Siegmund 
(Siegmund,  1985).  This  chapter  concentrates  primarily  on  Brownian  motion 
approximations  to  cumsum  procedures  (specifically  the  Page  test).  A 
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continuous  Brownian  motion  process  /  >  0}  is  used  as  an  approximation 
to  cumulative  sums  ^(g(x)  ±  k)  which  form  Page's  test.  The  original  problem 
of  detecting  a  disorder  as  described  in  Chapter  II,  is  now  modified  in  the  sense 
that  it  can  be  viewed  as  a  shift  in  the  drift  of  a  Brownian  motion 
approximating  a  cumsum  procedure. 

1.  Problem  Statement 

Let  V  be  the  time  of  shift  and  let  /i  >  0  be  the  amount  of  shift  in  the 
drift  of  a  standard  Brownian  motion  t  >  0),  B(0)  =  0.  Consider  the 

observation  process 

W(t)  =  ii(t-vf  +  B({)/  /I  >  0. 

Thus  the  observation  process  is  a  Brownian  process  with  drift  0  up  to 
the  point  of  shift  v,  and  after  that. 

The  Page  test  applied  to  Brownian  motion  is  defined  as  follows:  stop 
at  the  smallest  t  for  which  the  one-sided  test  with  boundaries  0  anc  a  stops. 
The  test  is  repeated  if  the  lower  boundary  0  is  reached  before  a.  Define  the 
stopping  role  to  be  as 

N  =  inf{t:  S(0  >  a} 

where 

S(t)  =  (W(t)  +  kt)  -  min  ( W(s)  +  ks) 

0<s<t 

for  detecting  a  one-sided  positive  shift  in  a  drift  and 

S{f)  =  max(W(s)  +  fcs)-(W(f)-i-fct) 

0<s<f 
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for  detecting  a  one-sided  negative  shift  in  a  drift.  The  variable  k  is  the 
reference  value  or  the  bias  of  the  test.  (Recall  the  fact  from  Chapter  n  that  it  is 
advantageous  to  bias  the  test).  Hence,  this  procedure  has  two  degrees  of 
freedom,  k  and  a  to  achieve  a  given  desired  performance. 

2.  Organization  of  this  Chapter 

The  primary  goal  of  this  chapter  is  to  analyze  the  performance  of  the 
Page  test  using  the  Brownian  motion  approximation,  namely,  the  evaluation 
of  the  Average  Run  Length  (ARL)  function  under  the  disorder  (Delay)  and 
und^^r  no  disorder  hypothesis  (mean  time  between  false  alarms).  These 
approximations  will  be  compared  with  the  results  obtained  in  Chapter  HI,  and 
a  new  error  (bias)  term  which  enables  the  "training"  of  the  Brownian  motion 
parameters  (drift  and  variance)  and  Page's  test  parameters  ik  and  a)  will  be 
presented. 

In  Section  B,  general  theory  about  diffusion  processes  and  the  related 
stopping  time  problems  is  presented.  The  first  threshold  crossing  time  and 
hitting  probabilities  are  shown  to  be  reduced  to  solving  2nd  order  differential 
equations.  The  relation  to  the  Page  test  is  introduced  and  a  new  bias  term 
which  enables  the  comparison  of  the  accuracy  of  the  calculation  is  presented. 

Section  C  deals  with  the  approximation  to  the  ARL  functions  of  the 
cumsum  procedure  and  an  explicit  form  for  the  bias  is  calculated. 

Simulation  results  are  presented  in  Section  D  and  compared  to 
simulation  results  presented  in  Chapter  III.  Also,  a  new  error  (bias)  term 
which  enables  the  "training"  of  the  Brownian  motion  parameters  and  Page's 
test  parameters  is  introduced. 

A  short  summary  is  presented  in  Section  E. 
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B.  GENERAL  THEORY  ABOUT  DIFFUSION  PROCESSES  AND  RELATED 

STOPPING  TIME  PROBLEMS 

In  this  section  general  properties  of  diffusion  processes  will  be  presented. 
It  will  be  shown  that  many  fimctionals,  including  the  first  threshold  crossing 
time  and  associated  probabilities,  boundary  behavior  properties  and  stationary 
distributions  of  cumsum  procedures,  can  be  approximated  by  using  one¬ 
dimensional  diffusions. 

1.  General  Description  and  Definitions 

Definition  (Karlin  and  Taylor,  1981).  A  continuous  time  parameter 
stochastic  process  which  possesses  the  (strong)  Markov  property  and  for 
which  the  sample  paths  X(t)  are  (almost  always)  continuous  functions  of  t  is 
called  a  diffusion  process. 

Consider  a  diffusion  process  {X(0,  t  >  0}  whose  state-space  is  an 
interval  I  with  endpoints  /  <  r.  Such  a  process  is  said  to  be  regular  if  starting 
from  any  point  in  the  interior  of  /,  any  other  point  in  the  interior  of  /  may  be 
reached  with  non-zero  probability.  Henceforth,  without  further  mention,  we 
shall  consider  only  regular  diffusion  processes. 

Dynkin  Condition  (Karlin  and  Taylor,  1981):  A  sufficient  condition 
for  a  standard  process  X(f)  to  be  a  diffusion  proc''ss  is  the  Dynkin  condition: 

Um^Pr{lX(f  +  h)-  X(f)l>  £|X(f)  =  x}  =  0  (4-1) 

for  all  X  in  /.  □ 

This  relation  asserts  that  large  displacements  of  order  exceeding  a 
fixed  e,  are  very  unlikely  over  sufficiently  small  time  intervals.  This  is  in  fact 
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a  formalization  of  the  property  that  the  sample  paths  of  the  process  are 
continuous. 

All  diffusion  processes  are  characterized  by  the  mean  and  the 
variance  of  the  infinitesimal  increments.  Let  AX(t)  be  the  increment  in  the 
process  accrued  over  a  time  interval  of  length  h,  (i.e.,  AX(t)  =  X(t+h)-X(t)), 
then 

Umi£{4X(()|X(()  =  Ar}  =  M»,() 

and  (4  “2) 

limi£{dX(0^|X(f)  =  x]=  a^{x,t). 
hiO  /j  *■ 

The  functions  ^(x,t)  and  o^(x,t)  are  called  the  drift  and  diffusion 
parameters,  respectively.  In  the  time  homogeneous  case,  the  functions  M(x,t) 
=  fiix)  and  o^(x,t)  =  cf^(x)  are  both  independent  of  t. 

A  Brownian  motion  process  (sometimes  called  the  Wiener  process)  is 
a  regular  process  on  the  state-space  I  with  parameters  ^(x)  =  0  and  o^(x)  = 
for  all  X.  Adding  a  trend  /xf  to  the  Brownian  motion  B(t)  produces  a 
Brownian  motion  with  drift  B(f)  +  In  this  case,  the  drift  parameter  is  /i, 
while  the  diffusion  parameter  remains  o^. 

The  Brownian  motion  process  {B(f),  t  >  0}  has  the  following 
properties; 

•  B(0)  =  0. 

•  {B(0,  t  >  0}  has  stationary  and  independent  increments. 

•  for  every  t>  0,  B(t)  is  normally  distributed  with  mean  0  and  variance  c^t. 
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When  c  =  1,  the  prcxress  is  called  the  Standard  Brownian  motion.  Notice  that 
any  Brownian  motion  can  be  converted  to  the  standard  process  by  scaling  via 

The  behavior  of  the  diffusion  process  Xt  =  {X(f),  t  >  0}  can  be  modeled 
by  nonlinear  stochastic  differential  equations  of  the  form 

dXt  =  ^iiXt,t)dt  +  dXt,  t)dBt,  t  >  0 

with  initial  condition  Xq,  where  /i  and  a  are  the  drift  and  diffusion 
parameters  as  defined  by  (4-2)  and  where  {Bt,t  >  0)  is  a  standard  Brownian 
motion.  Thus,  dBt  has  the  interpretation  as  a  "white"  noise  driver.  This 
notation  is  shorthand  for  the  integral  equation 

X,  =  Xo  +  J^/l(X5,s)iis  + 

This  integral  representation  of  a  diffusion  process  demonstrates  the  Markov 
property  of  the  diffusion.  That  is,  given  Xs,  for  each  s  >  0  {X/,  t  >  s)  and 
{Xi,  0  <  /  <  s}  are  independent.  This  property  is  easy  to  see  since  for  any  t>s> 
0,  we  can  write 

=  Xs  + j^V(X«,M)du  +  |^^(T(X„,M)dB„. 

This  equation  indicates  that  {X(,  f  >  s)  can  be  constructed  completely  from  Xj 
and  {Bu,t  >u>s}.  Thus,  with  Xg  fixed,  {Xt,  t  ^s]  is  generated  independently  of 
{Xi,  t  <  s)  since  {Bf  -  Bj,  t  >  s}  is  independent  of  all  the  past. 

The  following  theorem  determines  the  parameters  of  Y(t)  =  glX(t)], 
where  X(/)  is  a  regular  diffusion  process. 

Theorem  (Karlin  and  Taylor,  1981):  Let  {X(t),  t  >  0}  be  a  regular  diffusion 
process  with  parameters  /i(x)  and  o^(x)  whose  state-space  is  defined  on  /  =  (l,r). 
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Let  g  be  a  strictly  monotone  function  on  I  with  continuous  second  derivative 
g"{x)  for  I  <  X  <  r.  Then  Yft)  =  g[X(f)]  defines  a  regular  diffusion  process  on  1 
with  the  parameters 


^^Y{y)  =  I  (jr)g"(x)  +  ^i{x)-  g'(x) 

=  (4-3) 


2.  Stopping  Time  Fimctionals  of  Diffusion  Processes 

In  this  section  we  analyze  stopping  time  problems  using  properties  of 
diffusion  processes.  It  is  assumed  that  {X  ft),  f  >  0}  is  a  regular,  time 
homogeneous  diffusion  process.  Let  a  and  b  be  fixed,  subject  to  I  <  b  <  a  <  r, 
and  let  T(z)  =  Tj  be  the  hitting  time  of  z  defined  by 


[  inf{f  >0;X{t)  =  z} 


if  X(f)  z 
otherwise. 


Vt  >0 


We  use  the  notation 

T*  =  =  T{a,b)  =  min{T(<,),T(i)} 

to  denote  the  first  time  XU)  =  a  or  XU)  =  b.  For  processes  starting  at  X(0)  =  x  in 
(a,b),  this  is  the  same  as  the  exit  time  of  the  interval  (a,b): 

T{a,b)  =  inf|t  >  0;X(t) «  {(i,b)],  X{0)  =  x  e  (a,b). 


a.  Stopping  Time  Related  Problems 

This  section  concentrates  on  three  problems  related  to  the  first 
hitting  time  of  a  diffusion  which  are  relevant  in  the  case  of  the  cumsum 
procedure. 
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Problem  1.  Find 

u{x)  =  Pr[T(fl) <  T(b)lx(0)  =  xj  b<x<a  (4-4) 

that  is,  the  probability  that  the  process  reaches  a  before  b. 

Problem  2.  Find 

z;(:ir)  =  E|T*|X(0)  =  x|  b<x<a  (4-5) 

which  is  the  mean  time  to  reach  either  a  or  b. 

Problem  3.  For  a  bounded  and  continuous  function  g,  find 

Mx)  =  e| g(x(s))ds|x(0)  =  x^  b<x<a.  (4-6) 

Since  the  sample  paths  of  the  diffusion  processes  are  continuous  (4-1),  the 
integral  =  Jq  ^(^(s))^^s  is  defined.  If  gix)  represents  a  cost  rate  incurred 
whenever  the  process  is  in  state  x,  then  A  would  be  the  total  cost  up  to  the 
time  when  either  aor  b  was  first  reached.  If  gix)  =  1  for  all  x,  then  A  =  T*,  the 
time  to  reach  a  or  b,  so  that  problem  2  can  be  considered  as  a  special  case  of 
problem  3. 

h.  Solutions  of  the  Stopping  Time  Problems 

A  convenient  reference  for  the  solution  of  these  three  problems 
is  Karlin  and  Taylor  (1981,  Ch.  15),  where  it  is  shown  that  uix),  vix),  and  wix) 
possess  two  bounded  derivatives  ior  b  <  x  <  a,  and  that  these  functions  satisfy 
the  following  differential  equations: 

Solution  Equation  for  Problem  1 

0  =  p{x)^  +  ~a^{x)^-^  for  b<x<a,  u{b)  =  0,u{a)  =  l.  (4-7) 
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Solution  Equation  for  Problem  2 


2 

=  +  for  b<x<a,  v{b)  =  v{a)  =  0.  (4-8) 

Solution  Equation  for  Problem  3 

2 

-g{x)  =  n{x)^  +  -cP'{x)^-^  for  b<x<a,  w{b)  =  w{a)  =  0.  (4-9) 
dx  2  dx'^- 


In  order  to  solve  these  three  problems  we  need  to  use  several  new  functions. 
Let 


s(:!:)  =  exp 


-r 


^dt\ 

1 


for  I  <x<r 


(4-10) 


be  the  scale  density  of  the  process.  The  use  of  an  indefinite  integral  will 
become  clear  later.  Next,  the  scale  function  of  the  process  is  defined  by 

S{x)  =  j^s{T])d7]  (4-11) 


and  finally,  the  speed  density  is  given  by 

m(j:)  =  l/^o^(A:)s(x)j  for  l<x<r. 

Using  these  definitions,  the  solution  for  Problem  1,  namely  the  probability  of 
hitting  a  before  b  is  given  by: 

=  IxxKa.  (4-12) 

S(fl)-S(fe) 

The  solution  for  Problem  3  is  given  as 
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w{x)  =  2|u(x)-0S(a)-S(^)]m(.5)g(^>i^ 

+[1  -  «(:r)]J*[S(5)  -  S(i;)]  (4  - 13) 


The  solution  for  Problem  2  is  obtained  by  letting  g(<^)  =  1. 

Notice  that  the  solution  for  w(x)  can  also  be  written  as: 


(4-14) 


where: 


2 


G(Ar,«)  = 


2 


[S(x)-S(l.)p(a)-S(;)] 

S(a)-S(i>) 

[S(.)-S(x)][S(|)-S(i.)l 

S(»)-S{i) 


1 

1 


b  <  X  <  ^  <  a 

(4-15) 

b  <  ^  <  X  <  a. 


The  function  G(x4)  is  called  the  Green  function  of  the  process  on  the  interval 
[bMl 

Determining  the  mean  time  prior  to  T*  that  the  process  spends  in 
the  interval  [^,  ^+d)  is  equivalent  to  evaluating 

Mx)  =  e|0  g(X{s))d^X{0)  =  x| 


for 


^<x<^+A 

otherwise 


and  following  the  format  of  (4-14),  this  is 

U7(x)  =  v{x)  =  E{d7|X(0)  =  x}  =  G(x,  T])dri  (4  - 16) 
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we  see  from  (4-16)  that  G(x,^)d^  measures  the  mean  time  AT  prior  to  T*  that 
the  process  spends  in  the  infinitesimal  interval  +  d^]  given  by  X(0)  =  x. 


c.  Some  Examples  of  Functional  Calculations 

Given  the  solutions  (4-12),  (4-13)  and  (4-16),  some  cases  of 
interest  will  be  examined. 

(1)  Standard  Brownian  Motion.  Let  {X(f),  f  ^  0}  be  a  standard 
Brownian  motion  with  parameters  p(x)  s  0,  oHx)  s  1.  Then, 


s(x)  =  exp 


The  scale  measure  is  given  by 


Six)  =  X. 


Thus,  uix),  the  probability  of  hitting  a  prior  to  b,  with  initial  state  x,  is 


uix)  = 


x-b 

a-b 


b<x<a. 


(4-17) 


The  speed  density  in  this  case  is 


and  the  Green  function  (4-15)  for  the  interval  [b,a]  is 


G(x,^) 


'2{x-b){a-^) 

{a-b) 

2{^-b){a-x) 

{a-b) 


b<x<^<a 

b<^<x<a. 


Direct  calculation  from  (4-14)  gives 
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»(>:)  =  e{t.,1,|X(0)  =  x}  =  J“g(a:,  m 

=  (x-b}(a-x)  b<x^a.  (4-18) 

Remark.  A  process  {X(l)  whose  scale  function  is  linear  S(x)  =  x,  is  said  to  be  of 
natural  or  canonical  scale  since  the  hitting  probability  (4-17)  is  proportional  to 
actual  distances. 

Notice  that  the  scale  function  can  be  used  to  rescale  the  state- 
space  (/,r)  in  terms  of  probabilities  of  achieving  various  levels,  and  this  use 
motivates  the  name.  If  a  point  jcq  is  fixed  as  the  origin,  we  can  easily 
determine  a  new  scale  function  by  performing  a  translation,  causing  S{xo)  =  0 
and  form  a  process  Y(t)  =  S(X(0)  on  the  interval  (S(/),  Sir)).  Since  S  is  strictly 
monotone  and  twice  differentiable,  the  use  of  Theorem  (4-3)  establishes  the 
infinitesimal  parameters  of  the  process  {y(0): 

HYiy)^lcHx)S'Xx)  +  Mix)SXx) 

and 

o\iy)  =  <7^(x)[S'(r)]^  =  a^ix)  s"(x)  where  y  =  S(.t). 

The  scale  measure  for  {y(f)}  process  is  SY(y)  =  y,  thus,  the  use  of  the  scale 
function  enables  one  to  transform  a  process  to  a  natural  scale. 

(2)  Brownian  Motion  with  Drift.  If  {X(0,  f  >  0)  is  Brownian 
motion  with  nonzero  drift  ^(x)  and  variance  o^,  then: 

s(x)  =  exp(-2fix/o^)  (4-19) 

Six)  =  A  expi-2^/<j^)  +  B  iA  and  B  constants), 
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and 


u(x)  = 


g-2^a/<T^  _g-2fib/a^ 


b<x<a. 


(4-20) 


3.  Instantaneous  Return  Processes  and  the  Relation  to  Page'^s  Cumsum 

Procedure 

This  section  introduces  a  certain  boundary  behavior  of  the  diffusion 
process  that  defines  an  Instantaneous  Return  process.  This  process  is  shown 
to  describe  any  cumsum  procedure  and  forms  the  basis  for  the  approximated 
ARL  function.  It  enables  cilso  the  derivation  of  a  new  bias  term  which  is  used 
to  evaluate  the  accuracy  of  the  approximation. 

a.  Instantaneous  Return  Processes  (Karlin  and  Taylor,  1981) 

Consider  a  diffusion  {X  ftX  f  >  0}  on  the  state-space  /  =  il,r)  and  let 
I  <  b  <  a  <  r.  A  return  process  Z(f)  relative  to  [b,  a]  shown  in  Figure  4.1  and  is 
defined  as  follows:  Starting  at  a  point  xq  in  (b,  a),  the  process  is  returned 
instantaneously  to  xo  whenever  b  ox  a  is  reached.  After  such  a  return,  the 
subsequent  process  behaves  just  like  X(t).  This  process  is  repeated  at  each 
attainment  of  level  b  or  a. 

The  resulting  process  Z(t)  consists  of  recurrent  cycles  of  random 
time  duration  Ti,T2,  T3,  ...,  where  T,  are  independently  and  identically 
distributed,  with  the  same  distribution  as  =  min{Tfl,  Tf;),  the  first  exit  time 
from  the  interval  (b,  a),  starting  from  xo  (stationary  process).  It  follows  from 
(4-16)  that 

e{7;|X{0)  =  a:o)  =  j“G(xo,m  (4  -  21) 
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where  G(xo,^)  is  the  Green  function  of  the  process  X(t)  relative  to  the  state- 
space  (Mi- 

Let  be  the  density  function  of  Z(t).  Thus, 

P(f,y>iy  =  Pr{y  <  Z(f)  <  y  +  dy|Z{0)  =  xq}. 

Define  the  linniting  density  of  Z(f)  as 

a(ylx)  =  lim  P(Ly)-  (4  -  22) 

t-*oo 


To  do  so,  consider  an  interval  tyi,y2]  such  that  h  <y\  <y2<  a  and  define  the 
indicator  process  {7(f),  f  ^  0)  by 


m = 


0 


if  yi  <  Z(/)  <  y2 
otherwise. 


from  Figure  4.1,  we  see  that 


Pr{((()  =  1}  =  £{!(()}  =  P(t,yYy.  (4  -  23) 

Recalling  the  renewal  theorem  (Ross,  1989)  (Feller,  1971),  we  can  deduce  that 


lim  Pr{/(f)  =  7}  = 

t—^oo 


£{time  spent  in  (yi,y2)  in  a  cycle|Z(0)  =  Xg} 
Ejtime  duration  of  a  cycle|Z(0)  =  Xg} 


Using  (4-23)  we  get 


(4  -  24) 
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The  stationary  density  of  the  instantaneous  return  process  a(y  I  xq),  can  be 
interpreted  as  the  proportion  of  the  mean  time  spent  at  state  y  in  one  cvde  T,. 


Figure  4.1.  Instantaneous  Rehun  Process 
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b.  Relation  to  the  Cumsum  Procedure 

Let  N  denote  the  stopping  rule  based  on  the  cumsum  test  with 
reference  value  k  stopping  boundary  a  and  restarting  boundary  b, 

N  =  inf{t:X(t)>fl}  (4-26) 

where 

X(t)  =  (W(t)  +  fct)-  irin(W(s)+  fcs) 

0<s<r 

is  based  on  the  observation  process 

W{t)  =  H{t-v)'^  +  B{t),  n>0  (4-27) 

where  =  max(0,x),  and  p  defines  the  amount  of  shift  in  the  drift  of  a 
standard  Brownian  motion  Bit)  with  B(0)  =  0,  at  the  point  of  shift  v.  The 
reference  value  k  is  chosen  to  minimize  the  Delay  for  detection. 

Before  the  shift  occurs,  the  reference  value  guarantees  that  the 
test  will  hit  the  lower  boundary  and  cause  a  restart.  Each  restart  will  force  the 
process  to  return  to  the  initial  state  xo  and  start  once  again,  thus,  the  restart 
process  can  be  considered  as  causing  an  instantaneous  return  process. 

Notice  that  before  the  shift  occurs,  the  process  W(f )  is  a  Brownian 
motion  with  drift  k,  while  after  the  shift  (change)  in  drift  occurs,  W(f  )  is  a 
Brownian  motion  with  drift  Let  L  be  the  number  of  restarts  before  the 
shift,  and  let  {N,}  be  the  corresponding  run  length  intervals  of  the  test  until 
the  shift  is  detected  (i  =  1, ...,  Hereby,  we  follow  the  analysis  as  given  by 

Srivastava  and  Wu  (Srivastava  and  Wu,  1990). 

L  L+l 

i=l  j=l 
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The  average  delay  time  is  given  by 


D{t)  =  £, 


ft+l 

±Ni-l 

i=l 


where  EJ  )  denotes  the  expectation  when  the  shift  occurs  at  a  fixed  time  t. 
The  asymptotic  average  delay  time  or  the  stationary  average  run  length  is 
defined  as 

A^=limD(f).  (4-28) 


We  denote  ARL^(jco)  as  the  Average  Run  Length  of  the  diffusion  X(0  when 
the  shift  in  the  mean  is  n  at  the  initial  state  Xq  =  ^o-  Similarly, 
ARLo(O)  =  ARLo  denotes  the  ARL  under  no  change,  namely,  the  ARL  when 
there  is  no  shift  in  the  drift  and  the  initial  state  is  zero.  Hence,  ARLq  is  the 
mean  time  between  false  alarms  (with  initial  state  zero). 

Under  our  assumptions,  the  instantaneous  return  process  caused 
by  the  restart  process  will  be  at  some  stationary  state,  say  y,  when  the  shift 
occurs.  Denote  the  stationary  density  of  this  state  y  as  a(y  I  xq).  Figure  4.2  is  the 
appropriate  picture  to  guide  the  analysis.  Suppose  that  we  use  this  state  y  as  a 
new  initial  state  for  the  detecting  process  with  shifted  mean  to  find  ARL^(y). 
Thus,  the  stationary  average  delay  time  (4-28)  is  given  by 

ARL„(A:o)  =  jARL„(y)a(yU„)dy.  (4-29) 


Notice  that  ARL  ^ixo)  can  be  interpreted  in  two  ways.  First  as  a  weighted 
average  of  ARLs  under  disorder  over  the  set  of  all  possible  initial  states  y 
taking  into  account  the  effect  of  the  distribution  of  run  length  before  the 
disorder.  Second,  time-wise,  ARL  ^{xq)  takes  the  weighted  average  of  all 
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possible  places  of  shift  (since  for  each  realization  of  X(t),  each  different  y  is 
related  to  a  different  shift  time),  conditioned  that  the  shift  occurred.  Since 
ARL^(y)  is  a  decreasing  function  of  y,  we  obtain  that 


ARL|i(xo)  <  ARL^(y). 

Since  the  location  of  the  change  point  v  is  not  known  inside  the  last  run 
length  interval  Ni+i,  the  approximated  ARL  should  take  into  account  all  the 
possible  places  of  shift  within  the  last  run  length  interval,  thus,  the 
approximated  stationary  ARL  under  change  (Delay)  is  obtained  by  (4-28)  and 
(4-29)  while  the  bias  of  the  approximation  can  be  obtained  by 

bias(j:o,/i)  =  ARL^(xo)- ARL^(xo).  (4-30) 

Hence,  (4-30)  "measures"  the  effect  of  the  point  of  shift  in  the  limiting 
situation  for  the  cumsum  procedure.  In  the  following  section,  we  will  use 
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the  theory  of  this  section  to  derive  the  diffusion  approximation  to  ARLn, 
ARL^(y)  and  ARL  for  the  cumsum  procedure.  These  approximations  will  be 
used  to  compare  and  measure  the  accuracy  of  the  theoretical  results  derived 
in  Chapters  n  and  HI. 


C  BROWNIAN  APPROXIMATIONS  TO  THE  ARL  FUNCTIONS  OF  THE 
CUMSUM  PROCEDURES 

The  approximation  to  the  run  length  functions  for  the  one-sided  Page  test 
for  an  increase  in  the  drift,  will  be  obtained  with  the  aid  of  the  following  two 
lemmas.  Before  presenting  the  lemmas,  one  key  principle  of  the  diffusion 
process  which  is  relevant  in  our  case  needs  to  be  addressed.  This  will  be  done 
in  the  following  section. 


1.  The  Reflection  Principle  (Karlin  and  Taylor,  1968) 

A  Brownian  motion  with  a  reflecting  boundary  at  zero  behaves  as  a 
standard  Brownian  motion  in  the  interior  of  its  domain  (O,*®).  However, 
when  it  reaches  its  zero  boundary,  then  the  sample  path  returns  to  the 
interior  in  a  manner  of  that  of  a  light  wave  reflection  from  a  mirror.  In 
general,  consider  {Z(f),  t  >  0)  with  Z(0)  =  0  and  Z(f)  >  a  (a  >  0).  Since  Z  (f)  is 
continuous  and  Z(0)  =  0,  there  exists  a  random  time  r  at  which  Z(f)  firsts 
attains  the  value  a.  For  f  >  t,  we  reflect  Z(f)  about  the  line  z  =  a  to  obtain 


X(()  = 


Z(l) 

(a -[2(0- a] 


for  f  <  T 
for  f  >  T 


(4-31) 


(see  Figure  (4.3).  Note  that  X(t)  <  a  since  Z(T)  >  a.  Because  the  probability  law 
of  the  path  for  t  <  r,  given  X(t)  =  a,  is  symmetrical  with  respect  to  the  values 
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X  >  a  and  x  <  a  and  independent  of  the  history  prior  to  time  r,  the  reflection 
argument  displays  for  every  sample  path  '‘.nth  Z(T)  >  a,  two  sample  paths  X(t) 
and  Z(t)  with  the  same  probability  of  occurrence. 


Figure  4.3.  The  Reflection  Principle  about  Line  a 

The  following  lemma  establishes  the  fact  that  the  Page  cumsum  procedure 
(X(0  given  by  (4.26))  with  boundaries  (0^)  results  in  a  Brownian  motion  with 
an  absorbing  barrier  at  a  and  a  reflecting  barrier  at  0.  The  second  lemma  uses 
the  fact  that  the  reflecting  barrier  is  at  0  to  obtain  the  result  that  before  the 
disorder,  the  process  X(t)  with  a  reflecting  barrier  at  0,  can  be  viewed  as  the 
absolute  value  process  (set  a  =  0  in  (4-31)).  Thus,  the  reflecting  boundary 
phenomenon  is  equivcilent  to  setting  X(f)  =  1  Z(t)  I . 

Lemma  1  (Bagshaw  and  Johnson,  1975) 

Before  the  shift  occurs,  the  process  X(t)  given  by  (4-26),  has  the  same 
probability  law  as  a  Brownian  motion  W(t)  given  by  (4-27)  with  drift  k  and  a 
reflecting  barrier  at  0.  □ 
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Lemma  1  applies  to  any  diffusion  type  process.  It  states  that  before  the 
shift  occurs,  X(f)  =  (W(f)  +  kt)  -  mino  ^  s  s  t( W(s)  +  ks)  and  I  W(f)  I  have  the 
same  distribution.  Moreover,  the  distribution  of  the  first  passage  time  of  X(f) 
to  a  can  be  determined  by  finding  the  distribution  of  the  first  passage  time  of  a 
process  with  a  reflecting  barrier  at  zero  to  an  absorbing  barrier  at  a.  Thus,  it  is 
clear  that  after  the  shift  occurs,  X(f)  and  IW(f)l  do  not  have  the  same 
distribution  (since  a  is  an  absorbing  barrier). 

Using  the  results  of  lemma  1,  two  alternative  methods  can  be  used  to 
get  the  desired  approximation  for  the  ARL  function.  The  following  two 
subsections  describe  these  methods. 


2.  Direct  Calculation  of  the  ARL  Function  via  the  Functional  (4-8) 

Let  {X(f))  be  a  Brownian  motion  on  /  =  [0,«>)  with  drift  fj.  and  variance 
parameter  cr^,  where  0  is  a  reflecting  boundary.  Let  Tg  be  the  hitting  time  to 
level  a  >  0,  and  set  vix)  =  E{Tfl  I  X(0)  =  x}  for  0  <  x  <  a.  Then,  i’(x)  is  obtained  by 
solving  the  differential  equation  (4-8)  and  is  given  by  (Bagshaw  and  Johnson, 
1975)  and  (Karlin  and  Taylor,  1981): 


(4-32) 


where 


y=  n/a^. 
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Recall  that  before  the  shift  occurs,  X(t)  is  a  Brownian  motion  with  drift  k,  and 
since  Xq  =  0  (Page's  cumsum  test),  the  ARL  before  the  shift  is  obtained  from 
(4-32)  as  follows  by  setting  fi  =  k 


where 


ARLo  =  ARLo{0)  = 


) 


< 


a 


2 


Y  =  k/a^. 


Jt  =  0 


(4-33) 


After  the  shift,  X(t)  is  a  Brownian  motion  with  drift  n+k,  and  since  the  initial 
state  is  given  by  Xq  =  y  (see  Figure  4.2),  the  ARL  after  the  shift  is  given  by 

1 


ARL.(y)  = 


(4  +  fc)L 


a-y — ^(i 


-2r*y  _p-2y 


0  <  y  <  fl  (4  -  34) 


where 


y*  =  {^  +  k)/o^. 


3.  Calculation  of  the  ARL  and  ARL  Functions  using  the  Green 
Function 

Lemma  1  established  the  result  that  before  the  shift  occurs,  X(f)  has 
the  same  probability  law  as  a  Brownian  motion  with  drift  k  and  a  reflecting 
boundary  0.  The  following  lemma  use?  this  result  to  transform  the  reflected 
Brownian  motion  into  another  diffusion  process,  for  which  we  can  use 
theory  established  in  the  last  section,  namely,  the  use  of  the  Green  function  to 
derive  the  ARL  function. 
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Lemma  2  (Karlin  and  Taylor,  1968) 

Let  X(t)  be  a  Brownian  motion  with  reference  value  k  (bias)  as  defined  in 
(4-26).  Then,  before  the  shift  occurs,  X(t)  has  the  same  probability  law  as  the 
process  |w(f)|,  where  W(f)  is  a  Brownian  motion  with  parameters 

^^wi2)  =  {signz)k 
a^iz)  =  o^{\z\)  =  constant. 

for  all  z  in  the  state-space  /.  □ 

Thus,  the  reflecting  barrier  phenomenon  is  equivalent  to  setting 
X(f)  =  |vv(f)|,  where  W(f)  is  a  Brownian  motion  on  {-a,  a)  having  parameters 
given  by  Lemma  2.  Hence,  the  stopping  rule  (4-26)  can  be  modified  as 


N  =  inf{t:  |w(f)|  >  a] 


which  is  the  first  exit  time  from  the  interval  (~a,a).  Thus,  the  reflected 
Brownian  motion  which  describes  Page's  cumsum  procedure  is  transformed 
to  a  nonreflected  Brownian  motion  to  which  we  can  apply  the  results 
obtained  for  regular  diffusions. 

Recall  the  definition  of  the  Green  function  given  by  (4-15).  Then,  for 
the  process  X(f)  in  terms  of  the  process  VV(f),  the  scale  density  function  (4-10) 
is  given  by 


M^) 


^  ^-2lztk/o^ 


-a  <  z  <  a. 
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Since  the  initial  value  in  our  case  (Page's  Procedure)  is  x  =  0,  we  get  the  Green 
function  for  the  cumsum  procedure  (with  b  =  -a)  for  the  no  change 
hypothesis: 


G(0,z)  = 


J-a  j 

a2.,-2lzH/a2 

-a 

■f- 

J-a 

Now  define  y=kl 


g2i2irp"(0'")g-2iuiy^j^  r« 

_  ^  J-a _ Jinax(2,0) _ 

-2l2ly  „ 


-a<0<z^a 


-a  <z<0:^a 


•a  <z  <a 


1  _  g2(l2l-a)y 


-a<z<a.  (4-35) 


This  result  agrees  with  the  result  shown  by  Srivastava  and  Wu  (Srivastava 
and  Wu,  1990)  except  that  by  (4-35)  it  is  assumed  that  the  process  has  a  general 
diffusion  parameter  o^.  Hence,  from  (4-16),  ARLo  is  given  by 

ARLo  =  r  G(0,2V2 

J—a 
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and  a  direct  calculation  yields  the  same  result  as  given  by  (4-33).  Using  these 
results,  we  get  from  (4-25)  the  stationary  density  of  the  process  X(0  defined  by 
(4-26)  is  given  by  (y  is  a  stationary  state  of  the  process  X(f)) 


a(yl:c  =  O)  = 


G(o,y) 

J“G(0,y>iy 


0  <  y  <  a 


l_g-2r(«-y) 


a-- 


ly 


Thus,  using  (4-34),  ARL  is  obtained  as 
A^^{0)  =  j“ARL^(y)  ■  a{y\x  =  0>iy 


a-- 


27 


(/i  +  fc)y- 


2-e  y)  ^^a-y)2y*-e~^^*^ 


7* 

'i-(1  +  2)g)c--^) 

L  27^^ 

27  27*  2(7*-/) 

{y* /y){^^+k)^e  ^^  +  2>o-l 

(4-36) 


where: 


y  =  k  /  (before  shift) 

y*  =  {li  +  k)/  (T^  (after  shift). 
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Having  calculated  ARL^(y)  and  ARL  ^(0)  yields  an  analytical  approximation 
to  the  bias  as  defined  in  (4-30). 

D.  RESULTS 

Using  equations  (4-33)  and  (4-36)  for  calculating  ARLq  and  ARL^(O) 
respectively,  the  error  (bias)  term  has  been  calculated  via  equation  (4-30): 

bias  (0,/x)  =  ARL;i(0)  -  ARL  ^(0). 

For  the  symmetric  case  y=-^f2o^  (before  change)  and  'f  =  fi 1 2cfi 

(after  change).  The  reason  for  this  assumption  is  that  it  has  been  shown 
(Bagshaw  and  Johnson,  1975)  that  this  is  the  optimal  reference  value  if  the 
objective  is  to  minimize  the  ARL^  function. 

Figure  4.4  illustrates  the  bias  term  as  a  function  of  the  drift.  For  lower 
values  of  the  reference  value  k  the  bias  term  is  in  the  order  of  about  10 
samples. 

Figure  4.5  illustrates  the  effect  of  the  initial  value  on  the  delay  of  the 
cumsum  procedure  for  an  initial  value  of  y  =  5. 

Figure  4.6  illustrates  the  ARL  function  for  both  the  delay  and  the  mean 
time  between  false  alarms,  as  obtained  by  using  the  Brownian  motion 
approximations. 
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Figure  4.4.  The  Bias  Term  as  a  Function  of  the  Drift. 
Symmetric  Case  k  =  -/i/2 
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Drift  Analysis,  Symetric  case  k= -Mu/2,  y=5 
60 1 - 1 - 1 - 1 - ^ - 


Drift 


Figure  4.5.  The  Effect  of  the  Initial  Point  y  on  the  Delay 
Symmetric  Case  k  =  -^2 
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E  SUMMARY 

In  this  chapter  an  additional  viewpoint  to  the  analysis  of  cumsum 
procedures  was  introduced  by  using  the  Brownian  motion  approximations 
for  stopping  times.  The  problem  of  determining  the  probability,  the  average, 
and  some  general  cost  function  of  the  stopping  time  was  shown  to  be  reduced 
to  a  closed  form. 

Next,  the  behavior  of  the  diffusion  process  was  investigated  for  two  cases. 
In  the  first  case,  the  cumsum  test  was  shown  to  be  modeled  as  a  diffusion 
instantaneous  return  process  which  enabled  the  derivation  of  the  stationary 
density  of  the  diffusion,  thus  representing  the  density  of  the  cumsum  process. 
In  the  second  case,  the  behavior  of  the  diffusion  process  near  a  reflecting 
boundary  was  investigated  and  shown  to  be  the  key  to  determining  the 
approximation  of  the  ARL  function  for  cumsum  procedures.  Finally,  a  new 
error  (bias)  term  was  developed  allowing  one  to  predict  the  average  error  in 
the  delay  for  detection.  Also,  a  new  procedure  of  "tuning"  th.^  diffusion 
parameters  to  a  given  problem  was  introduced.  The  drift  parameter  was 
shown  (as  expected)  to  be  the  most  influential  parameter  for  the 
approximation. 
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V.  QUICKEST  DISORDER  DETECTION  METHODS:  THE  BAYESIAN 

FRAMEWORK 


A.  INTRODUCTION 

Consider  once  again  the  disorder  formulation  of  (1-2),  where  the 
observations  X],  xi,  ...  are  i.i.d.  random  variables,  such  that  up  to  a  certain  time 
v>  1,  X], ...,  Xv_i  are  identically  distributed  with  distribution  Po(x),  while  Xy, 
Av+i,  ...,  are  identically  distributed  with  another  distribution  P](x),  where  Poix) 
and  P\(x)  do  not  depend  on  v.  In  the  non-Bayesian  formulation  the  random 
time  V  is  considered  as  a  parameter,  and  this  formulation  leads  to  classical 
problems  of  hypothesis  testing.  By  the  Bayesian  approach,  the  parameter  v  is 
considered  as  a  random  variable  with  a  certain  distribution.  As  in  the  non- 
Bayesian  approach,  we  shall  be  concerned  mainly  with  the  problem  of  how  to 
use  the  observations  to  determine  as  quickly  as  possible  the  time  v,  or  the 
"disorder"  situation,  for  a  given  false  alarm  ratio.  Shiryayev  (Shiryayev,  1978) 
and  Roberts  (Roberts,  1966)  independently  proposed  an  approach  similar  to 
cumsum  procedures.  We  shall  refer  only  to  Shiryayev  and  use  his  notation. 

Shiryayev  solved  the  problem  of  quickest  disorder  detection  subject  to  a 
constraint  on  the  probability  of  false  alarms  Pr{N  <  v}  <  a  for  all  v  (where  N  is 
the  stopping  time)  in  the  Bayesian  framework. 

The  following  section  gives  a  short  presentation  of  his  work  and  some 
important  results  which  will  be  used  next  to  establish  new  results. 

This  chapter  is  organized  as  follows:  In  Section  B  we  introduce 
Shiryayev's  results  which  are  relevant  to  our  case,  and  form  the  basic 
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underlying  observation  process  which  is  used  to  solve  the  Bayes  version  of 
the  cumsum  procedures.  Section  C  presents  a  new  approach  to  evaluate  the 
performance  of  the  Bayes  version  for  cumsum  procedures.  The  analysis  is 
based  on  the  Shiryayev  optimal  Bayes  solution  (Shiryayev,  1978)  and  the 
modification  of  the  double  procedure  algorithm  of  Assaf  and  Ritov  (Assaf 
and  Ritov,  1988)  and  uses  Brownian  motion  approximations  to  solve  the 
Bayes  problem. 

Finally,  Section  D  contains  a  short  summary  of  the  results. 

B.  BAYESIAN  APPROACH  TO  CUMSUM  PROCEDURES  APPROXIMATED 

BY  BROWNIAN  MOTION 

1.  Problem  Formulation 

As  mentioned  in  the  introduction,  we  will  follow  the  work  done  by 
Shiryayev  (Shiryayev,  1978),  thereby,  a  new  derivation  of  the  performance  of 
the  cumsum  procedure  will  be  introduced  in  the  Bayesian  framework,  using 
some  of  Shiryayev's  results.  The  problem  will  be  presented  in  terms  of  a 
Brownian  motion  process  which  approximates  the  cumsum  behavior  (see 
Chapter  IV). 

Consider  a  Brownian  motion  process  {W(f),  t  >  0},  which  during  the 
time  interval  [0,v]  has  zero  drift,  and  during  (v,«>)  has  drift  /i  >  0,  where  V'  <  <» 
and  /i  are  unknown  parameters.  The  process  W(0  satisfies  the  stochastic 
differential  equation 

d\\\  -  fj.{t  -  vy  dt a dBf ,  u.’o  =  0 
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where  («)■*■  =  max(O^)  and  Bit)  is  a  standard  Brownian  motion  with  B(0)  =  0. 
In  other  words,  the  structure  of  the  observed  process  is 

f  cBt  t<  V 


Wt 


[^{t-v)+aBt 


t>  V 


(5-1) 


where  v  is  considered  as  the  (unknown)  "disorder"  time  in  which  a  disorder 
takes  place  in  the  observed  process,  and  the  local  drift  shifts  from  zero  to  /i. 

In  what  follows  we  assume  that  v  is  a  non  negative  random  variable  with 
a  priori  distribution 

Pr{v  =  0}  =  po/  Pr{v>  t|v>  0}  =  (5-2) 

where  po  and  A  are  known  constants.  Let  N  be  the  stopping  variable  which 
defines  a  certain  class  of  detection  rules  0.  The  class  0  of  those  solution  rules 
for  which  N  €  0  is  finite  with  probability  one,  is  denoted  by  A. 

For  every  <pe  A,  let 

R{0,po)  =  Pr{N  <  v)  +  c  •  e{{N  -  v)^} 

be  the  risk  consisting  of  the  probability  of  a  false  alarm  Pr{N  <  v')  and  the 
average  delay  of  detecting  the  disorder  correctly,  E{N-v|  N  >  v).  The  cost  of 
one  observation  is  assumed  to  be  c  >  0.  Thus,  the  cost  of  the  false  alarm 
compared  to  the  cost  of  the  delay  in  detection  is  determined  by  the  value  of  c. 
Define 

p{N')=  ini  R(,p,po).  (5-3) 


where  N*  is  the  optimum  stopping  rule  which  minimizes  the  cost  function. 
Hence,  the  problem  can  be  slightly  changed,  i.e.,  to  find  among  all  the  rules 
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0  e  4  with  a  given  probability  of  false  alarm  a  =  Pr{N  <  v},  a  rule  N*e  p 
which  guarantees  the  minimum  of  the  mean  time  of  delay,  if  the  detection 
was  correctly  done,  i.e.,  such  that 

D(a)=  inf  E{N*-vjN*^  v}  (5-4) 

where  N*  is  called  the  Bayes  time.  Thus,  the  Bayes  problem  of  quickest 
detection  can  be  formulated  in  the  following  way: 

For  a  given  false  alarm  probability  a  =  Pr{N  S  v),  find  the  observation 
method  with  the  minimum  average  delay,  (which  minimizes  the  risk  (5-3)). 
The  following  theorem  establishes  the  optimum  observation  method  in  the 
class  of  decision  functions  A. 

2.  The  Optimum  Bayes  Solution 

Theorem  (Shiryayev,  1978):  For  a  given  false  alarm  probability 
Pr(N  <  v)  <  a  S  1,  the  optimum  observation  method  for  the  problem  of 
minimizing  the  average  delay  as  defined  by  (5-4)  consists  of  observing  the 
process 

Z,  =  Pr{v<l|w„  s<()  (5-5) 

with  the  initial  condition  Zq  =  po  <  1  and  deciding  that  a  disorder  is  present 
when  a  threshold  a  <  1  is  first  attained.  Hence  the  stopping  rule  is  given  by 

N  =  inf{t:  t  >  0,  Zt  >  a} 

where  a  =  1-a.  The  process  Zj  satisfies  the  following  differential  equation; 

dZt=?i{l-Zt)dt  +  ^Zt{\-Zt)dBi  Zq  =  z.  (5-6) 

o 

□ 
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For  the  prcx)f  of  optimality  of  N,  see  Shiryayev  (Shiryayev,  1978, 
Ch.  A).  The  theorem  gives  tv^ro  important  results:  First,  the  structure  of 
optimal  Bayes  time  N*  solutions  consists  of  observing  the  current  posteriori 
probability  that  the  change  has  already  occurred.  The  process  Wf  is  observed 
until  the  process  Zj  reaches  (at  time  N*)  for  the  first  time  a  certain  level  a. 
Second,  Zj  is  a  diffusion  process  with  time  homogeneous  coefficients  given  by 

^iiz)  =  Xil-z) 

o  (5-7) 

a^{z)  =  [{n  /  a)-z{l-z)f 

where  and  c  are  the  time  homogeneous  coefficients  of  the  observation 
process  (5-1).  Notice  that  when  A  0,  i.e.,  when  the  mean  time  at  which  the 
disorder  occurs  E{v}  =  A-^  tends  to  infinity,  hence  ^(z)  =  0  and  the  diffusion  Z( 
has  a  zero  drift.  Notice  also  that  in  this  case  it  is  natural  to  assume  that  or  1. 
This  situation  indicates  that  the  disorder  appears  on  the  background  of  an 
established  stationary  regime.  Shiryayev  solved  the  problem  of  quickest 
detection  under  this  assumption.  For  a  given  mean  time  between  false 
alarms  T,  under  the  optimal  method  ot  observation,  the  mean  delay  time 
D(T)  is  given  by 

D{T)  =  E{N  -  \^N  >  v] 

=  i{log(yr)-l-C} 

where 

y  =  Hcp- 

C  =  0.577...=  Euler  constant 
T  =  (  l-a)/A 


167 


Notice  that  the  assumption  that  the  disorder  is  preceded  by  a  long  process  of 
observation  in  which  a  stationary  regime  is  established  implies  that  A  — >  0, 
a  ->  1,  but  such  that  T  =  (l-a)/A  is  fixed. 

The  results  established  in  this  section,  in  particular  the  diffusion  type 
behavior  of  the  optimal  observation  method,  namely,  the  posterior 
probability  of  change,  motivates  a  new  formulation  of  the  quickest  detection 
of  cumsum  procedures  and  will  be  presented  in  the  next  section.  The  analysis 
will  make  use  of  the  theory  of  diffusion  processes  established  in  Chapter  IV. 

C  THE  BAYES  SOLUTION  TO  CUMSUM  PROCEDURES 

The  framework  set  by  Shiryayev  enables  a  convenient  formulation  of 
quickest  detection  problem  for  cumsum  in  the  Bayesian  framework.  Hereby, 
there  analysis  of  Shiryayev  (Shiryayev,  1978)  and  Assaf  and  Ritov  (Assaf  and 
Ritov,  1988)  is  modified  to  obtain  a  new  performance  analysis  of  the  optimal 
Bayesian  stopping  time  solution  for  cumsum  procedures. 

1.  Problem  Formulation 

The  behavior  of  cumsum  procedures  as  processes  which  exhibit 
renewal  properties  (Chapter  II)  and  which  can  be  described  by  instantaneous 
return  processes  (Chapter  IV)  establishes  the  observation  th^*^  for  a  general 
cumsum  procedure,  the  process  of  local  minima  (or  local  maxima)  results  in 
regimes  (i.e.  periods  between  successive  local  minima  points)  in  which  the 
diffusion  approximation  has  a  certain  drift.  The  disorder  occurs  in  one  of  the 
regimes,  where  the  diffusion  approximation  will  exhibit  a  change  in  the  drift. 
The  problem  of  quickest  detection  is  concerned  with  the  minimization  of  the 
average  number  of  bad  regimes  which  are  mistakenly  accepted  during  one 
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cycle,  i.e.,  between  two  successive  alcinns.  Notice  that  if  the  disorder  occurred 
in  the  last  regime  in  the  cycle,  then  the  average  delay  is  given  by  ARL  ^(xq)  as 
defined  in  (4-29).  Let  L  be  the  number  of  regimes  in  one  cycle.  Thus,  the  first 
L-1  regimes  are  accepted,  each  time  the  present  regime  is  accepted  the  test 
continues  to  the  next  regime,  while  the  last  one  is  rejected  and  produces  the 
alarm.  Let  X,-  be  the  set  of  observations  within  the  regimes,  i.e.,  X]  denotes  the 
obser  vation  set  within  regime  1,  etc.  Assume  that  the  (true)  change  occurred 
in  regime  v.  Thus,  Xq,  Xi, ...,  Xv-i  are  independently  distributed  according  to 
some  Po  while  Xv,  •..,  are  independently  distributed  to  some  Pi. 
Assumption  1:  Both  Pq  and  Pi  are  the  normal  distributions  with  known 
means  po  and  pi  and  common  variance  cr^. 

Assumption  2:  It  is  assumed  that  the  change  occurs  only  between  regimes 
and  not  within  a  regime.  This  assumption  can  be  justified  by  the  fact  that  by 
using  the  ladder  variable  approach  it  was  shown  in  Chapter  II  that  the  process 
of  local  minima  reflects  the  set  of  time  instants  which  are  more  likely  to  be 
the  change  points.  Moreover,  it  was  shown  (2-33)  that  the  actual  number  of 
regimes  within  a  cycle  is  geometrically  distributed.  Following  Assumption  2 
we  establish  the  last  assumption. 

Assumption  3.  The  change  regime  v  has  a  prior  which  is  geometrically 
distributed  with  a  known  parameter  0  <  p  <  1,  i.e.,  Pr{v  =  n)  =  p  for  n>l. 

Let  L  be  a  stopping  time  for  declaring  a  change.  Let  a  =  Pr{L  <  v}  be  the 
probability  of  false  alarm  and  let  D  =  E{L-v)^  be  the  expected  number  of 
regimes  which  are  mistakenly  accepted  in  one  cycle  (i.e.,  which  are 
mistakenly  identified  as  regimes  containing  “no  change"  information). 
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Consider  the  following  version  of  the  optimal  problem:  find  a 
stopping  time  L  which  minimizes  D  subject  to  the  constraints  a  and  ARLo(  ), 
where  a  is  a  given  probability  of  false  cdarm  and  ARLo(  )  is  the  mean  time 
between  false  alarms  defined  in  (4-29) 

2.  Optimal  Bayes  Solution 

The  solution  of  the  optimal  Bayes  problem  is  given  by  (5-5)  and  is 
denoted  as  the  Z  process.  For  any  regime,  Z/  is  the  “current"  posterior 
probability  that  the  change  has  already  occurred  given  the  first  i  regimes. 

Zf  =  Pr{v  <  /|Xo,X,,...,X;}  £  =  0,1,2 .  (5-9) 

Due  to  Shiryayev  results,  (5-9)  is  defined  as  the  observed  process  which 
behaves  like  a  diffusion  process  with  time  homogeneous  coefficients  given  by 

H{z)  =  0 

.  7  (5-10) 

o^{z)  =  [{A^/  (T)z{l-z)f 

where  Afu  = 

Notice  that  the  underlying  model  assumes  that  within  a  regime  the  cumsum 
behaves  like  a  Brownian  motion  with  drift  parameter  hq  or  and  variance 
parameter  The  corresponding  observed  process  (5-9)  has  zero  drift 
parameter  due  to  the  fact  that  it  is  assumed  that  the  change  does  not  occur 
within  the  regime.  Notice  that  this  assumption  results  in  a  natural  scaled 
diffusion  whose  scale  function  is  linear,  Sfz)  =  z. 

The  observed  Z/  process  is  defined  on  the  state-space  1  =  (0,1).  Let 
0  <b  <  a  <  1.  The  optimal  stopping  rule  is  defined  as  follows: 

•  accept  the  present  regime  and  move  to  the  next  one  whenever  Z/<  b. 
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•  continue  sampling  within  the  present  regime  as  long  as  b  <  Z/<  a. 

•  reject  the  present  regime  and  declare  a  disorder  as  soon  as  Z/>  a. 

Thus,  the  stopping  regime  is  given  by 

L  =  inf{^:  Z/>  a]. 

The  initial  value  of  Z  at  the  first  regime  is  zq  =  p  while  a  decision  to  accept  a 
certain  regime  and  to  move  to  a  next  one  results  in  an  initial  condition 

zo  =  b  +  p(l-b).  (5-11) 

This  result  is  due  to  the  fact  that  for  any  regime  in  the  cycle  except  the  last  one 
the  test  is  terminated  at  the  lower  boundary,  Z/=  b  0  <t  <  L-1 .  We  obtain 
(5-11)  by  using  the  law  of  total  probability.  All  the  regimes  following  the  first 
one  have  the  same  probabilistic  behavior.  Thus,  their  initial  z  values  are 
given  by  (5-11).  See  Figure  5.1  for  a  pictorial  illustration. 

3.  Cumsum  Performance  Analysis 

The  goal  of  this  section  is  to  find  the  relationship  between  the  test 
parameters  a,  b,  and  p  and  the  delay  D  and  the  probability  of  false  alarm  a.  To 
start  the  analysis  we  need  to  find  E{L},  the  average  number  of  regimes  within 
a  cycle.  Note  that  L  is  modified  geometrically  distributed  since  L  is  a  mixture 
of  two  random  variables,  the  first  of  which  is  identically  zero  and  the  second 
of  which  is  geometric  (Assaf  and  Ritov,  1988).  Hence,  the  probability  of 
success  p  should  be  calculated  when  the  initial  value  is  zq  and  the  probability 
of  failure  q  should  be  calculated  with  initial  value  zq. 
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Figure  5.1.  The  Observed  Diffusion  Process  z(n) 


FT{z{r,)  =  b} 

Pr{2(n)  =  a} 

Pr  hitting  b  before  ajinitial  regime  value  =  Zq) 

Pr  hitting  a  before  h|initial  regime  value  =  ^} 

Recall  the  results  obtained  for  a  general  diffusion  process  Zi  for  solving  for 
the  probability  of  hitting  the  boundary  a  before  b  as  given  by  equation  (4-4). 
The  solution  is  given  by  (4-12). 

w(2o)  =  Pr{T(a)  <  T{b)  I  Z(0)  =  zo)  fc  <  zq  <  a, 

hence,  E[L)  can  be  obtained  by 
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(5-12) 


£{!.}  = 


l-i'(zo) 


Since  the  observed  diffusion  Z/  has  zero  drift  coefficient  within  any  regime, 
the  Z  process  has  a  natural  scale  function  S(z)  =  z.  Thus,  using  (4-17)  we 
obtain 


uizo)  = 


ZQ-b 

a-b 


b<zo<  a 


z  Q-b 

uizo))  =  -^  b<zo<a 

which  results  in 

E{L)  =  ia-zo)/(zo-b). 

Using  'So-b-^  p(l-ib),  the  expected  number  of  regimes  per  cycle  is  given  by 

E{L]  =  ia-zo)/pi\-b).  (5-13) 

Having  derived  an  explicit  form  for  the  of  average  number  of 
regimes  per  cycle  E{f},  enables  one  to  show  the  relationship  between  a 
(probability  of  false  alarms)  and  D  (delay)  with  the  test  parameters  a,  b,  and  p. 

To  compute  a,  notice  that  when  the  observed  process 
Zt  =  Vr[v  <  l  \  Xo, ...,  X;)  crosses  the  upper  boundary  a  and  causes  an  alarm,  then 
Z/  =  fl  or  Pr{v'  <  ^  I  Xo, X/}  =  a,  thus  it  follows  that  1-a  =  a,  or 

a  =  1-fl.  (5-14) 

To  compute  D  =  E{L-v}+,  notice  that  Z/  is  the  expected  value,  using  posterior 
information  of  the  indicator  function  /(v<o  (Shiryayev,  1978),  i.e. 
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1 


v</ 


hence 


Thus, 


hv<i)  =  1 

[o  v>i 

P>-{'(v5/)  =  l|Xo,...,X,}  =  Pr{v  <  /|Xo,...X,} 

=  z,. 

D=  E{L-vf 
=  i£{'(vs/)|Xo . X,) 


(5-15) 


We  obtain  (5-15)  which  is  consistent  with  Shiryayev's  result,  but  here  the 
derivation  is  done  in  a  much  simpler  way.  Since  for  0  <  ^  <  L-1  the 
Z  process  terminates  by  crossing  the  lower  boundary,  thus 
Z/=b  ioT  t  =  0, ...,  L-1.  Hence, 
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D  =  Ei 


=  E 


E  IZ/!E 


1/-1 


=  E{LE{Zi]} 
=  E{L}b 


=  {a-ZQ)b/p{l-b). 


(5-16) 


4.  Asymptotic  Analysis 

In  this  section  we  are  concerned  with  the  asymptotic  analysis  of  the 
delay  D  as  p  0.  Since  1/p  determines  the  average  rate  of  changes,  this 
asymptotic  analysis  will  indicate  the  performance  of  the  optimal  algorithm 
when  the  rate  of  changes  is  small.  Hereby  we  shall  consider  the  constrained 
version  of  minimizing  D  subject  to  given  values  of  Pr{v  <  L]  =  a  and 
ARLo(zo)  =  T,  the  regime  time. 

The  analysis  starts  with  computation  of  the  average  cycle  time  E{C) 
which  is  needed  to  analyze  the  asymptotic  average  delay.  The  diffusion  type 
behavior  of  the  observed  process  Z/  enables  the  use  of  techniques  introduced 
in  Chapter  IV  to  obtain  the  result  for  E(C1.  Finally,  we  obtain  an  asymptotic 
approximation  for  the  average  delay. 

a.  Calculation  of  the  Mean  Cycle  Time  E{C] 

Figure  5.1  is  the  appropriate  picture  to  guide  the  following 
analysis.  To  compute  the  expected  cycle  time,  E{C},  notice  that  the  first  regime 
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run  length  in  a  cycle  starts  with  initial  condition  zq,  while  all  the  following 
regimes  start  with  initial  condition  given  by  (5-11).  Let  Ni  be  the  sampling 
time  of  the  first  regime  in  a  cycle,  then  E{Ni)  is  the  expected  time  it  takes  the 
diffusion  to  reach  b  or  a.  Similarly,  let  N  be  the  time  needed  for  the  diffusion 
starting  at  Iq  to  reach  b  or  a.  The  total  run  length  of  a  cycle  is  given  by 
C  =  where  L  is  the  stopping  regime  and  with  Ni,  N2,  independent 

and  identically  distributed  like  N  .  Applying  Wald's  equation  and  using 
(5-13)  we  obtain 

£{C}  =  E{No}  +  (E{L}-1).E{n) 

=  E{No}  +[(»  -  zo)  /  K'  - 1)-  1]e{n).  (5-17) 

To  make  the  computation  simpler  we  consider  the  long  run 
situation,  using  the  simplification  zq  =  zo.  In  this  case  (5-17)  becomes 

e{c}  =  e{l}e{n} 

=  [(a-2o)/p(l-b)]E{N} 

=  [(fl-Zo)/p(l-b)]ARLo(zo).  (5-18) 

The  last  result  is  due  to  the  fact  that  E{  N  }  is  the  average  regime  time  which 
is  by  definition  equal  to  ARLofzo)  since  within  the  regime  the  drift  coefficient 
is  zero. 


b.  Calculation  of  ARLo(z) 

The  observed  diffusion  2/  is  in  natural  scale  since  the  scale 
measure  is  linear  (see  4-17),  i.e., 

S(z)  =  z. 


176 


Using  the  expression  of  the  variance  coefficient  (5-7)  we  obtain  from  (4-15) 
the  Green  function  for  the  observed  diffusion  Z/ 


2 


2 


(z-bXa-^) 

(^-b}(a-z) 

(Au/crf(a-b)^^iJ-^f 


b  <  z  <  ^  <  a 
b  <  ^  <  z  <  a. 


The  Average  Run  Length  ARLofz)  is  given 
(Assaf  and  Ritov,  1988), 

ARLo{2)=E{w) 


=  j“G(z.i)d 


(A/u  /2of{a-b) 


(z  -  b){2a  -  l)>og{^}  («  -  ^)('  - 


For  the  limiting  situation 

lim  Zn  =  lin^  {b  +  p(l  -  &)j  -> 

p-bO  p-^0 


(5-19) 


it  follows  that  in  this  situation  ARLeffc)  0  (as  anticipated).  Thus,  it  follows 
that  for  the  constraint  ARLo(2)  =  T  to  be  satisfied,  we  need  b  —>  0  resulting  in 
0/0  situation. 
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lim  ARLnfzn)  =  ~ 

p-^0  p-^0(A^  /  2af{a-b) 

b— »0  ' 


(a-b-p{l-b]){l-2b)log 


(fe  +  p(l-b))(l-b)] 
b{\-b-p{l-b)) 


=  lim - 5— [fl 

p-^0(Ap  /2a)  a 


log(l  +  p/fc)] 


1 

{Ap  /  2a)^ 


log(l  +  p/i)). 


Notice  that  ARLqCzo)  approaches  in  the  limit  to  a  finite  value. 


(5-20) 


c.  Asymplutic  Delay 

For  the  limiting  situation  we  also  obtain  the  following 
approximations: 


lim  E{L}  =  lim  (l  -  Zq)  /  p(l  -  b) 
z->0  z->0 

b— >0  b— >0 


=  a/p 


=  (l-a)/p  (5-21) 


and 

limE{C}  =  (l-a)-ARLo(0)/p  (5-22) 

2-^0 

b^O 

and  for  the  constraint  ARLo(O)  =  T  we  need 


(5-23) 


Substituting  (5-23)  in  equation  (5-16)  for  D,  we  obtain 
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limD  =  ab  I  p 
zo->0 


=  {l-a)b/p 


=  j.  (5-24) 


Hence,  the  asymptotic  average  delay  is  given  in  terms  of  the  constraints  a  and 
T  and  the  signal  parameters  Afi  =  Mi-Mo 

Since  the  ratio  Apia  can  describe  a  measure  for  signal  to  noise 
ratio,  the  average  delay  (5-24)  can  also  be  described  as 


D  =  (I-«)/fe(^/2XSNR)2' 


(5-25) 


Notice  that  in  the  limiting  situation  the  average  delay  does  not  depend  on  p 
Once  again,  as  for  Shiryayev's  result  (5-8),  as  p  ->  0,  a  1,  and  the  delay  D 
approaches  the  limit  to  a  finite  value. 


D.  SUMMARY 

The  fact  the  cumsum  procedure  can  be  viewed  as  a  process  of  local 
minima  (or  respectively,  maxima)  enabled  the  use  of  the  Brownian  motion 
approixmation  to  the  optimal  observation  process  Z/.  With  the  aid  of  these 
tools,  ARLo(  )  given  by  equation  (5-20)  and  the  asymptotic  delay  (5-25)  were 
derived  and  shown  to  reach  finite  values. 
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VI.  DETECTION-ESTIMATION  ALGORITHM  FOR  NOISY  DATA  WITH 
ABRUPT  CHANGES  (DISCONTINUITIES)  MODELED  BY  THE  PIECEWISE 

STATE-SPACE  MODEL 


A.  INTRODUCTION 

Until  now,  all  of  the  chapters  dealt  with  problems  of  disorder  as  defined 
for  Types  1,  2,  and  3  (see  Chapter  1).  In  this  chapter  we  present  a  Type  4 
problem,  namely,  an  initial  condition  disruption  problem.  The  use  of  state- 
space  models  as  descriptive  models  for  the  initial  condition  disruption  allows 
the  joint  estimation  of  the  change  time  v  and  the  state-space  parameter 
representing  the  observed  signal.  This  methods  seems  to  be  efficient 
compared  to  GLR  methods  for  certain  classes  of  problems  since  the  Kalman 
filter  gains  and  covariance  matrix  can  be  computed  off-line  if  the  state-space 
matrices  do  not  change  in  time.  However,  this  is  not  the  case  for  AR  or 
ARMA  modeling  in  the  state-space  format. 

The  problem  of  detection-estimation  or  detection-smoothing  of  signals 
with  time-varying  statistical  characteristics  is  of  great  interest  in  many  areas  of 
signal  processing.  In  many  cases,  prior  knowledge  of  the  signal  characteristics 
can  be  used  to  model  (using  model-based  techniques)  the  non-stationary 
behavior.  In  this  section,  the  statistical  changes  are  modeled  by  piecewise 
deterministic  state-space  equations  with  random  initial  conditions,  and 
measurements  corrupted  by  additive  Gaussian  white  noise  (Cristi,  1988).  A 
particularly  interesting  class  is  the  case  of  signals  representable  by  Auto- 
Regressive  models  with  piecewise  constant  coefficients  (Andr-^Obrecht,  1988V 
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Also,  the  class  of  PSK  (Phase  Shift  Keying)  signals  enters  this  category,  where 
the  phase  of  the  sinusoidal  carrier  is  shifted  according  to  the  information 
(Point,  1987).  For  the  PSK  the  phase  shift  of  the  sinusoidal  carrier  can  be 
modeled  by  change  of  initial  condition  of  a  state-space  model  that  describes 
the  sinusoid.  The  goal  is  a  non-coherent  detection  scheme  that  will  recovet 
the  piecewise  constant  phase. 

For  such  classes  of  signal  models,  we  can  approach  the  combined 
detection-estimation  problem  as  a  combination  of:  a)  detection  of  the 
transition  points,  in  order  to  segment  the  data  field  into  compact  regions 
having  similar  characteristics  (for  example  constant  phase  in  the  PSK  signal), 
and  b)  filtering  within  the  regions  to  reconstruct  the  original  signal.  In  the 
estimation  framework  the  joint  estimation  of  the  change  time  and  the  model 
parameters  can  be  achieved. 

Previous  works  (Cristi,  1990)  used  techniques  based  upon  the 
combination  of  Markov  Random  Fields  (MRF)  models,  with  Recursive  Least 
Squares  (RLS)  algorithms  in  order  to  estimate  the  model  parameters  within 
the  regions  for  ID  or  2D  fields.  Another  approach  (Point,  1987)  used 
Kalman  filtering  techniques  in  order  to  estimate  the  change  instants  in  PSK 
signals.  Hereby  we  present  a  new  technique  based  on  Kalman  filtering 
techniques,  which  calculates  the  joint  distribution  of  the  measurements 
and  the  change  process  (defined  as  the  transition  process)  over  a  finite 
length  window.  The  approach  presented  in  this  section  is  based  upon  a 
Maximum  a-Posteriori  (MAP)  framework  (Cristi  and  Aviv,  1991).  The  signal 
of  interest  is  described  by  a  piece  wise  state-space  modeling,  with  initial 
conditions  set  at  the  beginning  of  each  interval.  By  applying  the  Kalman 
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filtering  technique,  the  two  tasks  of  segmentation  and  filtering  over  the 
segmented  regions  can  be  achieved.  The  method  leads  to  hypothesis  testing. 
Moreover,  if  the  state-space  matrices  do  not  change  in  time,  the  algorithm  can 
be  implemented  at  a  low  computation  cost. 

This  chapter  is  organized  as  follows:  Section  B  presents  the  model  and 
the  assumptions  used  to  describe  the  prior  needed  foi  the  algorithm.  Section 
C  presents  the  algorithm  derivation,  and  finally.  Section  D  presents 
simulation  results.  A  short  summary  is  given  in  Section  E. 

B.  MODEL  DESCRIPTION 
1.  Problem  Statement 

Consider  the  state-space  model 

+  =  no  change 

^  ^  l’^/c(”)  change  in  initial  conditions. 

y{n)  =  cx{n)  +  w{n)  (6-1) 

where  x,.c(«)  is  the  initial  condition  vector  at  instant  n,  and  A,B  are  known 
matrices,  c  is  a  known  vector,  v(m)  and  w(n)  are  i.i.d.  white  Gaussian  drivers 
with  zero  mean  and  known  covariance  Q  and  a  2. 

V  ~  N(0,Q) 
zv  -  N(0,o^). 

Notice  the  doubly  stochastic  nature  of  the  process  (x).  In  this  respect  the 
process  can  be  described  as  a  combination  of  two  models:  one,  modeling  the 
regions  corresponding  to  the  initial  condition,  and  one  for  state-space  model 
itself. 


Let  7=  {yin),  n  >  0},  y(n)e  {0,1}  be  defined  as  the  process  of  transitions, 


i.e., 


if  x(m+1)=  Ax(n)  +  v(rj) 
if  x(n+l)  =  X/c 


2.  Model  Assumptions 

Assumptions  on  xjcin)  and  yin)  are  needed. 

(1)  Assumptions  on  x/c(«)  are  independence 

P(x/c(n)  1  xicin-l) ...  x/cfO))  =  P(xjc(n)) 
and  that  P(x/c(n))  is  Gaussian  vsdth  known  mean  and  variance. 

P(x,c(n))  =  N(x_i,P_i), 

where  x_i  is  the  a  priori  mean  and  P_i  is  the  covariance  matrix  of  the  vector 
x/c- 

(2)  Assumptions  on  the  transition  process  jin)  are 

•  There  exists  an  integer  d  such  that  at  most  only  one  transition  occurs  in 
the  process  /during  any  interval 

•  The  process  y  is  assumed  to  be  d-Markov,  in  the  sense  that 

P(7it)jjit-J)  ...  7<0))  =  P()<f)!)<f-l)  ...  jit-d)) 

for  all  t  >  d,  which  implies  that  the  statistics  of  /  are  known  from  the 
last  d  samples. 


3.  Probabilistic  Model  for  the  Transition  Process  yin) 

In  order  to  assign  a  probability  measure  to  /,  define  the  following 
"truncated"  sequence; 


Yu+i  =  •••  }<0] 
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where  Lhe  pair  (t,d+l)  defines  the  truncation  boundaries,  t  defines  the  tirre 
index  of  the  starting  element,  and  d+i  defines  the  sequence  length.  In  a 
similar  way  we  can  define  any  other  tnmcated  sequence. 

The  vector  of  possible  realizations  of  yis  defined  by 

e[  =  [^<0) ...  ej<d)Fe  ;  =  -l . d 

where 

[1  ifi=y  i  =  0,...,d 
jo  if  /  ^  i  j  =  l,...,d 

and 

e_i  =  0 

The  possible  d+2  realizations  of  the  vector  £y  are  of  the  form  that  at  most  only 
a  single  "1"  can  be  present  at  any  i  location  (0  S  i  <  d)  corresponding  to  a 
change  at  location  /.  Thus,  ts  results  in  a  change  inside  the  window  at 
location  (t-s),  while  e_i  is  by  definition  the  no  change  vector. 

From  the  assumption  of  y  being  a  Markov  process  with  realizations 
Zj,  we  can  determine  its  probabilistic  model  as 

fl  if  =E;  i^-\ 

P(><0  =  0lYi-i.<i>i)=  1  .  (6-2) 

,Po  if  Y<-M+l  =  E-l 

The  reason  behind  this  equation  is  the  fact  that  )<f)  =  0  with  probability  1  if  a 
transition  exists  in  the  interval  [t-d-l,  f-1]  which  defines  the  previous 
"sliding  window."  If  there  was  no  change  in  the  previous  window 
(Y,  =£-])/  then  y(t)  =  0  with  probability  Pq,  thus  y(t)  =  1  with  probability 

Pi  =  1-Po-  Figure  6-i  shows  realizations  of  the  process  y. 


Figure  6.1.  Realization  Map  of  the  Process  {y) 
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C  THE  DETECTION-ESTIMATION  RECURSIVE  ALGORITHM 


In  this  section  the  detection-estimation  algorithm  will  be  presented  based 
on  a  ivlaximnm  a  Posteriori  (MAP)  probability  approach  in  order  to  extract  th  . 
transition  process  {y\  from  the  observations  {y}  and  estimate  the  process  ^x}. 
Notice  that  the  transition  process  [y\  defines  the  regions  of  the  same 
probabilistic  nature  (constant  phase  in  the  case  of  PSK  signals),  resulting  in 
the  segmentation  task.  Within  this  framework,  the  transition  points  t  are 
indicated  by  the  process  {j]  and  they  correspond  to  y(t)  =  1.  The  algorithm 
presented  here  is  based  on  a  "sliding  window"  of  length  d+l  over  which  the 
likelihood  of  the  transitions  is  recursively  computed  using  the  following 
lemma  1. 

1.  Basic  Lemma 

Lenixna  1.  Define  Y(  =  [y(0)  ...,y(f)] 

yt  =  [)<0) ...,  )<01 

y(f)] 

and  sir''’larly: 

Y^+i  =  [y(t-d) ...,  y(t}] 

=  [yit-d) ...,  ><f)], 

Y,_u.i  =  [y(f-<^-i)-vy(f-i)] 

Y/_u  =  [y(/-<i)  ...,  y(f-l)] 

then 
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p(y(0|Vi^Yt,dH-i)p(y(^)|Yf 
■  P(y(/  -d- 1)1  yp(r{t-d- l)|t,-j-2 , V,_ 

(6-3) 

Proof:  By  Bayesian  factorization: 

=  ^(Y/,j+i;  Y^i+^;Y>_rf_l;T^_d-l)•P(Yu+l|Vd-l;Y^-i-^)  (6-4) 

The  left-most  probability  term  can  be  further  factored: 
yt,d+‘'i>'^t-d-\'yt-d-\)  - 

p(yt,d^\>%-d-vyi-d-A 

=  p(y(0|Y<-i,Y/,j+i;Y/-d-i)^ 

p{X-■^,J^■[\^t-d-2'yt,d+^■'yt-d-\)■p{%-d-2'ytM^i'yt-d-^) 

P{y(t-if-l)|Y^_^_2;Y<,d+i;Y/-cf-i)-P(Y,-i_2;Y(,d+i;Yf-i-i) 

becduse  of  the  Markov  property  of  the  yft)  process,  it  is  clear  that  in  the 
conditioned  probability  terms,  once  Y»^+i  is  known,  then,  yt-d-l  is  redundant. 
Furthermore,  since  is  independent  of  )<f),  the  conditioned  probability 

term  in  »he  numerator  becomes; 


Notice  also  that  in  the  denominator  y(t-^i-l)  is  independent  of  hence, 

the  left-most  expression  in  (6-4)  becomes: 


f(y(‘)|Vi''yMti)  I  I  \ 


(6-5) 


The  right-most  probability  term  in  (6-4)  can  be  also  further  factored: 

P{%-d-vyt-d-i) 

=  p(y(0|Yf-i,d;Y/-<i-i ;  Yt_d_i )  •  P{yt-u>  i)|Yf-<i-2 ;  %-d-\ )  x 

^p{yt-d-24%-d-\) 


P{yt-d-V%-d-\) 

p(Y{i)\yt-hd>yi-d-\>'^t-d-\) 


(6-6) 


P(y(f-d-l)|Y,.d_2;Y,_^.i) 


^{Y/-u;y(f-'^-i)|Y/-j-2;Yf-d-i) 


Therefore,  inserting  (6-5)  and  (6-6)  into  (6-4)  yields  the  desired  recursion 
(6-3).  □ 

By  the  recursion  (6-3)  we  can  update  the  statistics  of  the  processes  {y} 
and  (y)  over  the  interval  [t-d,t]  conditioned  on  past  values  of  these  processes. 

2.  Likelihood  Function  of  the  Transition  Process 

Using  the  probability  model  of  the  last  section,  the  transition  process 
{>)  can  be  estimated  at  time  (l-d),  i.e.,  the  edge  of  the  window,  on  the  basis  of 
the  observations  up  to  time  t,  (i.e.,  using  the  data  within  the  window  [t-dj]), 
by  using  the  Kalman  filter  technique. 

The  rationale  behind  this  can  be  explained  as  follows:  Suppose  there 
was  no  (true)  change  in  the  signal's  model.  If  the  initial  condition  of  the  filter 
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is  changed  at  any  time  instant  within  the  interval  [t-d,t],  then  there  will  be  a 
probabilistic  mismatch  between  the  true  signal  and  the  estimated  one  (See 
Figure  6-2a).  Suppose  now  that  the  initial  condition  is  changed  at  the  same 
time  instant  the  true  change  occurred,  then,  by  forcing  the  change  to  be 
evaluated  at  the  edge  of  the  sliding  window,  namely  at  time  t-d,  will  create  a 
probabilistic  match  between  the  true  signal  and  the  predicted  signal  that  relies 
on  the  maximum  number  of  available  observations  (d)  (See  Figure  6- 2b.). 


Estimated  Signal 

True  Signal 


I 


b 

Figure  6.2.  Relationship  between  Estimated  Signal  and  True  Signal 

a.  Probabilistic  Mismatch 
b.  Probabilistic  Match 
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The  proposed  algorithm  can  be  viewed  in  light  of  this  interpretation 
as  follows:  at  each  time  instant  t  we  calculate  d+2  likelihood  terms  /j(ep,  of  all 
the  possible  realizations  of  [y],  each  one  of  these  realizations  is  associated 
with  the  assumption  that  the  change  occurred  at  the  d+\  possible  locations 
within  the  window  [t-d,t],  and  one  corresponds  to  the  "no  change" 
hypothesis  (see  Figure  6.3). 


t-d 

t-di 

Eo 

t-2  M  f 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

^  |(£  d-^  ) 

1  1  1  > 

'  '/,(£,)' 

1  '.(Ej)  ,1 

1 

1 

1 

1 

l((6d) 

_ 

Figure  6.3.  Calculating  of  d+2  Likelihood  Terms 


The  algorithm  "looks"  at  all  the  possible  realizations  of  {7}  within  the 
window  [t-d,t]  and  decides  about  a  change  in  a  way  which  will  be  described  in 
the  next  section. 

3.  Recursive  Detection 

The  recursive  detection  algorithm  is  based  upon  the  likelihood  terms 
as  described  in  the  last  section. 

Define  the  MAP  estimate  of  7  as: 
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(6-7) 


yit-d)  =  arg  m«{p(y((  -  ii)|Y,,  )}. 

By  standard  Bayesian  factorization,  it  is  easy  to  see  that 

=  p(y((  -  dp,  ,-f  ) 

Since  the  rightmost  term  does  not  depend  on  y(t-d),  it  is  easy  to  see  that 
maximization  of  (6-7)  is  equivalent  to  the  maximization  of 

The  likelihood  term  in  (6-8)  can  be  recursively  determined  from  the 
probability  relation  given  by  Lemma  (6-3). 

In  order  to  achieve  the  maximization  efficiently,  recall  that 
assumes  only  the  realizations  ey(;  =  -1,  ...,  d)  since,  at  most  one  transition 
occurs  within  any  interval  thus,  the  probability  of  "no  change  at  it-d)" 

iy{t-d)  =  0)  is  the  union  of  all  the  possible  mutually  exclusive  events  of  a 
change  occurring  at  each  of  the  other  time  instants  within  the  window  (/  =  1, 
d-l)  including  the  event  of  no  change  (;  =  -1). 

The  hypothesis  of  no  change  is  given  by 

d-l 

Hq:  P(Y,,j.„r((-<i)  =  0|Y,_j_,,T,-j_,)=  £|,(e,)  (6-9) 

)=-l 

and  the  hypothesis  of  change  is  given  by 

Hi:  P(Y,,jtl,r(f-‘i)  =  l|Y,_j_,.T,_i_, )  =  (,(£,)  (6-10) 

where 
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;  =  (6-11) 


Hence,  the  maximization  (6-7)  becomes  hypothesis  testing  problem 


d-l  ,  ^^0 
/=-!  Hi 


(6-12) 


where  each  likelihood  term  is  given  by  (6-11)  and  calculated  via  the  recursion 
(6-3).  Equation  (6-12)  evaluates  the  likelihood  /j(£d)  of  a  change  at  t-d 
against  all  the  other  possible  changes  within  the  window.  Hence,  the  vector  ey 
can  also  be  viewed  as  an  indicator  vector  of  the  change  assumption  (or  re¬ 
initialization  location  of  the  Kalman  filter). 

4.  Implementation 

By  careful  examination  of  the  recursion  (6-3),  it  is  clear  that  the 
denominator  is  constant  with  respect  to  the  transition  sequence  yt,d- 
Furthermore  the  right-most  term  in  the  numerator  is  given  by  (see  Figure 
6-1) 


P(y(0|Yf-i,d+i;Vd-i 


=£; 

if  T(-l,d+l  =  E-i 
ifT/-l,(f+i  =e-i 


/^-l 
r{t)  =  0 
y{t)  =  1 


Now,  notice  that  the  left-most  term  in  the  numerator  of  (6-3)  is  computed 
directly  from  the  Kalman  filter  equations,  since  F|>'(r)|Y,_],Y,  j  is  equal 

to  F’(y(t)|Y,_i )  for  j  >  1,  given  that  the  filter  was  reinitialized  at  time  t-j.  Thus, 

the  "past"  sequence  Y{_i  is  the  "truncated  past";  {y(f-l)  ...  yit-j)]  and  contains 
all  the  past  observations  since  the  filter's  initialization.  Hence,  calculating 
(6-11)  via  (6-3)  becomes: 
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(6-13) 


^(^;)  =  ^  ^(y(0|Y/-i,;)p(y(0|Yt-u+i;  Vu+i) 

where  C  is  a  constant  independent  of 

Using  the  realization  map  of  [y\  (Figure  6.1),  the  update  phase  of  the 
algorithm  can  be  calculated  as  follows: 

Update: 


1. 

If 

f(t-d-l)  =  0,  then 

j  =  -1: 

/Ke_i)  =  C  P(y(f)  1  Ym,  Yu+1  =  E_i)  •  po  ■  /t_i(e-i) 

;  =  0: 

Itito)  =  CP(y(t)  1  Ym,  Yu+1  =  Eo)  •  Pi  •  /m(e_i) 

(6-14) 

for 

l<j<d 

liiej)  =  C  Piyit)  1  Ym,  YU+i  =  Ey)  •  /m(e/-i) 

2. 

'  If 

fit-d-l)  =  1,  then 

/,(e-i)  =  CPiy(t)  1  Ym,  yu+1  =  E-i)  • 

(6-15) 

Ititj)  =0  V;  ^  -1. 

Change  detection: 

at  each  time  instant  t  check  for: 


d-\  9{t  -d)  -  Ono  change 

(6-16) 

/=-l  -d)  =  1  change 

The  procedure  introduced  above  can  be  represented  on  the  graph  shown  in 
Figure  6.4. 

At  each  time  instant  t,  the  nodes  marked  as  /  =  -1,  0, d  refer  to 
the  realizations  of  y  and  the  corresponding  likelihood  terms  If  (ej).  These 
realizations  of  the  likelihood  terms  are  updated  at  each  node  according  to 
equations  (6-14)  and  (6-15). 
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Updating  of  these  equations  is  done  on  the  basis  of  the  transition 
probability,  i.e.,  the  conditional  probability  terms  P(y(t)  |  Yt_i  y)  and  is 
determined  by  the  well  known  properties  of  the  Kalman  filter  (Anderson  and 
Moore,  1979) 

Let  Xj  be  the  state  vector  of  the  state-space  model  at  node  At 
each  node  j  of  the  graph,  we  update  the  estimate  of  xy,  (i.e.,  xy)  and 
consequently  the  probability  terms  are  as  follows: 

1.  if  /  =  1, ...,  d  then 


2. 


Time  update  (use  estimate  of  filter  ;-l): 

Xy(flt-l)  =  Axy_i(t-llf-l) 

- 1)  =  AVy_^(t  -  lit  -  1)A^  +  B  Q 

Observation  Update  (Filtering): 

Xy(tlt)  =  Xy(fl{-l)  +  ly{f)[y(0-C^Xy(tlt-l)] 
ly(f)  =  Vy(flf  -  !)•  C  •[c’^Vy(flf  -  l)c  + 
V^(m)  =  [l-ly(f)c^]Vy(fU-l). 


-1 


if  j  =  0,  then: 


initialize  filter. 

io('l')  =  S-i 
V„((l()  -  F_,. 


(6-17) 


(6-18) 


x_,  and  P_i  being  the  initial  state  and  initial  filter  error  covariance 
matrix  respectively. 
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3.  if  j  =  -1,  then; 

x_i(f  1 1)  is  updated  as  in  (6-17),  (6-18)  with  the  index  change  (see  Figure 

6.1) 

t  ify((-<i-l)  =  0 

The  state-space  representation  and  the  Kalman  filter  yields  an  efficient 
algorithm  for  the  desired  transition  distribution  P(y(t)  1  =  tj ), 

=  =£,)  =  (2^/{0)  '^^exp  ■^^(y(f)-c^Xy(flf-l))^ 

Sjit)  =  Jyj{t\t-l)c+o^  (6-19) 

namely,  the  desired  distributions  /♦(/)  are  Gaussian  with  mean  1 1-1)  and 
covariance  Sy(/).  Hence,  the  likelihood  terms  (6-14)  and  (6-15)  can  be  updated 
on-line  by  the  following  equation; 

/,(e,)=C.P,nP(y((-i)IY,_,_i,c,_,) 

1=0 

=cp,n/, _,(/-/)  is/sd. 

1=0 

The  presented  algorithm  lends  itself  to  a  parallel  structure 
implementation.  Furthermore,  if  the  state-space  matrices  are  not  time 
dependent,  the  Kalman  filter  gains  and  covariance  matrices  V;  can  be 
precomputed,  since,  in  this  case  the  filter's  performance  is  a  priori  known  and 
not  data  dependent.  Hence,  lookup  tables  can  be  prepared  resulting  in  a 
simple  computational  cost  algorithm. 
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D.  RESULTS 

The  algorithm  was  implemented  and  tested  on  different  data  structures  as 
piecewise  constant  models,  PSK  models  and  AR  models  in  different  signal  to 
noise  ratios.  The  results  obtained  show  that  the  transitions  are  estimated  by 
the  algorithm. 

Figure  6.5  illustrates  the  results  obtained  for  detecting  the  transitions  in 
piecewise  constant  signals  represented  by  the  state  space  model 

ar(n  +  l)  =  x(n) 
y{n)  =  x{n)  +  w{n) 


Figure  6.5.  The  Joint  Detection-Estimation  of  a  Piecewise  Constant  Signal. 

a.  Noisy  Data 
b.  Filtered  Data 
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Figure  6.7,  PSK  Signal  with  Input  SNR  of  -9dB, 

a.  Noisy  Data 

b.  Filtered  Data 

c.  Estimated  Transitions 
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Figure  6.8  illustrates  data  obtained  by  using  an  AR  model.  Figure  6.9 
illustrates  the  estimated  transitions  while  Figures  6.10  and  6.11  illustrate  the 
true  and  estimated  AR  parameters. 


Figure  t.9.  Estimated  Transitions 
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Figure  6.10.  True  Xi  and  Estimated  Xj  AR  Parameter 


Figure  6.11.  True  X2  and  Estimated  X2  AR  Parameter 
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E  SUMMARY 

The  problem  of  joint  detection-estimation  is  addressed  in  this  chapter. 
The  state-space  representation  of  the  signal  model  allows  joint  detection 
estimation  by  using  the  Kalman  filter  properties.  Furthermore,  the  detection 
is  completely  asynchronous.  Since  the  algorithm  is  based  on  optimal 
estimation  techniques,  it  is  expected  to  be  able  to  detect  signals  in  the  presence 
of  very  low  SNR.  However  simulation  results  do  not  permit  this  type  of 
conclusion.  Detailed  performance  analysis  of  this  algorithm  is  not  available 
now  and  requires  more  research. 
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VII.  CONCLUSIONS 


This  dissertation  investigated  different  types  of  disorder  problems  by 
using  sequential  procedures  for  on-line  implementation.  The  problem  was 
considered  within  the  framework  of  detecting  changes  in  statistical  models  of 
an  observed  random  process  when  the  disorder  can  occur  at  unknown  times. 
The  focus  of  this  work  was  on  quickest  detection  methods  for  cumsum 
procedures  implemented  for  different  parametric  and  nonparametric 
nonlinearities.  In  this  context,  several  issues  remain  unresolved,  namely,  for 
a  multiple  disorder  problem  or  for  transient  detection  a  critical  issue  is  the 
joint  estimation  of  disorder  time  and  the  model  parameters.  There  is  much 
more  to  do  in  investigating  this  problem  by  implementing  recursive 
identification  procedures  together  with  detection  procedures. 

In  Chapter  III,  the  concept  of  detecting  energy  changes  in  the  Energy 
Spectral  Density  of  a  signal  reflect  different  spectral  signatures  and  is  of 
interest  in  many  applications.  More  work  can  still  be  done  in  the  theoretical 
domain  in  order  to  examine  the  coupling  effects  between  wdndow  sizes, 
averaging  methods  within  a  window  with  the  root  location  and  the  minimal 
SNR  needed  for  detection.  Moreover,  modern  spectral  energy  estimators 
might  be  considered  rather  than  the  transitional  p>eriodogram. 

The  detection  algorithm  which  was  presented  in  Chapter  VI  has  an 
advantage  of  being  noncoherent  with  respect  to  coherent  detectors  for  PSK 
type  signals.  Even  though  the  algorithm  is  optimal  in  the  sense  that  optimal 
techniques  (Kalman  filtering)  were  used,  there  is  still  room  for  investigating 
its  performance  as  a  function  of  window  length.  Also,  the  merits  of  this 
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approach  should  be  compared  to  traditional  detection  methods  of  PSK  signals 
to  check  the  tradeoff  between  noncoherent  versus  coherent  detection. 

The  disorder  problem  can  be  considered  a  local  problem.  Thus, 
conventional  time  frequency  methods  for  detecting  and  estimating  the 
change  parameters  have  the  problem  of  the  tradeoff  between  the  time- 
frequency  resolution.  It  seems  that  the  wavelet  representation  which  has 
become  popular  recently  has  the  potential  to  resolve  this  time-frequency 
resolution  problem. 

Finally,  the  research  can  be  extended  to  the  situation  where  the 
measurements  are  dependent  (Sadowsky,  1989)  for  providing  a  parallel 
framework  for  evaluating  Page  test  performance. 
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APPENDIX  BASIC  CONCEFl  S  OF  HYPOTHESIS  TESTING  AND 

DETECTION  THEORY 
(FROM  KASSAM,  1988) 

Let  X  =  (Xj,  X2,  Xn)  be  a  random  vector  of  observations  'vith  joint 
probability  density  function  (pdf)  Pyix  I  $),  where  0  is  a  parameter  of  the 
density  function.  Any  specific  realization  x  =  (xi,  X2, Xn)  of  X  will  be  a  point 
in  9v",  where  SR  is  the  set  of  all  real  numbers.  In  binary  hypothesis-testing 
problems  we  have  to  decide  between  one  of  two  hypotheses,  which  we  will 
label  as  Ho  and  about  the  pdf  Px(x  I  6),  given  an  observation  vector  in  SR". 
Let  0  be  the  set  of  all  possible  values  of  6;  we  usually  identify  Hq  with  one 
subset  Sho  ^  values  and  Hi  with  a  disjoint  subset  so  that  0  = 

0Uy  This  may  be  expressed  formally  as 

Hq;  X  has  pdf  Px{xiP)  with  P  e  (^“1) 

H]:  X  has  pdf  Px(xlP;  with  P  e  ©//j.  (A -2) 

If  ©Hg  and  ©/i^  are  made  up  of  single  elements,  say  0h^  and  0H■^,  respectively, 
we  say  that  the  hypotheses  are  simple;  otherwise  the  hypotheses  are 
composite.  If  ©  can  be  viewed  as  a  subset  of  ER^  for  a  finite  integer  p,  the  pdf 
Px(x  I  6)  is  completely  specified  by  the  finite  number  p  of  real  comf)onents  of  6, 
and  we  say  that  our  hypotheses  are  parametric. 

A  test  for  the  hypothesis  Hq  against  Hi  may  be  specified  as  a  partition  of 
the  same  space  S  =  SR"  of  observations  into  disjoint  subsets  Shq  and  so 
that  X  falling  in  Sh^  leads  to  acceptance  of  Hq,  with  Hi  accepted  otherwise. 
This  may  also  be  expressed  by  a  test  function  which  is  defined  to  have  value 
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5(x)  =  1  for  X  €  Sfjj  and  value  S(x)  =  0  for  x  e  Sh^-  The  value  of  the  test 
function  is  defined  to  be  the  probability  with  which  the  hypothesis  Hi,  the 
alternative  hypothesis,  is  accepted.  The  hypothesis  Ho  is  called  the  null 
hypothesis. 

More  generally,  the  lest  function  can  be  allowed  to  take  on  probability 
values  in  the  closed  interval  [0,1].  A  test  based  on  a  test  function  taking  on 
values  inside  [0,1]  is  called  a  randomized  test. 

The  power  function  (P(0l  5)  of  a  test  based  on  a  test  function  5  is  defined 
for  6  e  u  ©H]  2S 

^P(0lS}=E{S(x)ia}  (A-  '^) 

=  jS(x)P^(xld)dx. 

Thus  it  is  the  probability  with  which  the  test  will  accept  the  alternative 
hypothesis  Hi  for  any  particular  parameter  value  0.  When  0  is  in  the 
value  of  ?P10I  S)  gives  the  probability  of  an  error,  that  of  accenting  Hi  when  Ho 
is  correct.  This  is  called  a  type  /  error  or  the  prohuii'ity  of  false  alarm,  and 
depends  on  the  particular  value  of  0  in  0//^.  The  size  of  a  tost  is  the  quantity 

a  =  sup  t{01S)  [  '  \) 

which  may  be  considered  as  being  the  best  upper  bound  on  the  type  I  error 
probability  of  the  test. 

Similarly,  we  define  the  Operating  Characteristic  (OC)  of  a  test  Q(6I  S), 
based  on  a  test  function  6,  as  the  probability  with  which  a  test  will  accept  the 
null  hypothesis  Ho  for  any  particular  parameter  value  0.  When  0  is  in 
the  value  of  Q(0\  S)  gives  the  confidence  (l-a),  that  of  accepting  Hq  when  Hg 
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ii  correct.  When  6  is  in  the  value  of  Q(0l  S)  gives  the  probability  of  miss 
ip),  that  of  accepting  Hq  when  H]  is  correct.  This  is  called  a  type  II  er^or  and 
depends  on  the  particular  value  of  0in  Figure  A.l  illustrates  a  typical  OC 
function  of  a  test. 


Figure  A.l.  A  Typical  Operating  Characteristic  Function  of  a  Test 


In  signal  detection  the  null  hypothesis  is  often  a  noise-only  hypothesis, 
and  the  alternative  hypothesis  expresses  the  presence  of  a  signal  in  the 
observations.  For  a  detector  D  implementing  a  test  function  5(x)  the  power 
function  evaluated  for  any  6  in  O  gives  a  probability  of  detection  of  the 
signal.  Thus,  we  will  use  the  notation  5^61  D)  for  the  power  function  of  a 
detector  D,  and  in  discussing  the  probability  of  detection  at  a  particular  value 
of  the  parameter  0in  (or  for  a  simple  alternative  hypothesis  H])  we  w'ill 
use  for  it  the  notation  Pq.  The  size  of  a  detector  is  often  called  its  false-alarm 
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probability.  This  usage  is  encountered  specifically  when  the  noise-only  null 
hypothesis  is  simple,  and  the  notation  for  this  probability  is  Pp/\. 

A.  MOST  POWERFUL  TESTS  AND  THE  NEYMAN-PEARSON  LEMMA 

Given  a  problem  of  binary  hypothesis  testing  such  as  defined  by  (A-1)  and 
(A-2),  the  question  arises  as  to  how  one  may  define  and  then  construct  an 
optimum  test.  Ideally,  one  would  like  to  have  a  test  for  which  the  power 
function  T{6\  S)  has  values  close  to  zero  for  6  in  Ohq>  arid  has  values  close  to 
unity  for  6  in  Ony  These  are,  however,  conflicting  requirements.  We  can 
instead  impose  the  condition  that  the  size  a  of  any  acceptable  test  be  no  larger 
than  some  reasonable  level  Oq,  and  subject  to  this  condition  look  for  a  test  for 
which  "PiOl  S),  evaluated  at  a  particular  value  of  6  in  bas  its  largest 
possible  value.  Such  a  test  is  most  powerful  at  level  oq  in  testing  Hq  against 
the  simple  alternative  6  =  6/./^  in  its  test  function  6*M  satisfies 


sup  T{d\5*)<aQ 

(.4-5) 

(/1-6) 

for  all  other  test  function  6{x)  of  size  less  than  or  equal  to  ao-  In  most  cases  of 
interest  a  most  powerful  level  ao  test  satisfies  (A-5)  with  equality,  so  that  its 
size  is  a  =  Oo- 

For  a  simple  null  hypothesis  Ho  when  6  =  is  the  only  parameter 
value  in  6//^,  the  condition  (A-5)  becomes  ?P(0H]  I  5*)  <  cto  or  Pp^  ^  0,  subject  to 
which  Pd  at  0  =  6^^  is  maximized.  For  this  problem  of  testing  a  simple  Hq 
against  a  simple  H\,  a  fundamental  result  of  Neyman  and  Pearson  (called  the 
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Neyman-Pearson  lemma)  gives  the  structure  of  the  most  powerful  test.  We 
state  the  result  here  as  a  theorem: 

Theorem  1:  Let  5(x)  be  a  test  function  of  a  the  form 


1 

■r(x) 

,  ^xl  j  =  tP^  ^xl  ) 

(A-7) 

0 

A 

o 

for  some  constant  t  ^  0  and  some  function  r(x)  taking  on  values  in  [0,1].  Then 
the  resulting  test  is  most  powerful  at  level  equal  to  its  size  for  Hq:  6  =  6hq 
versus  H\  :  9=  Oh y 

In  addition  to  the  above  sufficient  condition  for  a  most  powerful  test  it 
can  be  shown  that  conversely,  if  a  test  is  known  to  be  most  powerful  at  level 
equal  to  its  size,  then  its  test  function  must  be  of  the  form  (A*7)  except 
perhaps  on  a  set  of  x  values  of  probability  measure  zero.  Additionally,  we 
may  always  require  r(x)  in  (A-7)  to  be  a  constant  r  in  [0,1].  Finally,  we  note 
that  we  are  always  guaranteed  the  existence  of  such  a  test  for  Hq  versus  H],  of 
given  size  a  [Lehmann,  1959,  Ch.  3]. 

From  the  above  result  we  see  that  generally  the  structure  of  a  most 
powerful  test  may  be  described  as  one  comparing  a  likelihood  ratio  to 
constant  threshold, 


(/I -8) 


in  deciding  if  the  alternative  H-[  is  to  be  accepted.  If  the  likelihood  ratio  on  the 
left-hand  side  of  (A-8)  equals  the  threshold  value  t,  the  alternative  H]  may  be 
accepted  with  some  probability  r  (the  randomization  probability).  The 
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constants  t  and  r  may  be  evaluated  to  obtain  a  desired  size  a  using  knowledge 
of  the  distribution  function  of  the  likelihood  ratio  under  Hq. 

When  the  alternative  hypothesis  Hi  is  composite  we  may  lock  for  a  test 
which  is  uniformly  most  powerful  (UMP)  in  testing  Hq  against  Hi,  that  is,  one 
which  is  most  powerful  for  Hq  against  each  $=  6h^  m  0Hy  While  UMP  tests 
can  be  found  in  some  cases,  notably  in  many  situations  involving  Gaussian 
noise  in  signal  detection,  such  tests  do  not  exist  for  many  other  problems  of 
interest.  One  option  in  such  situations  is  to  place  further  restrictions  on  the 
class  of  acceptable  or  admissible  tests  in  defining  a  most  powerful  test;  for 
example,  a  requirement  of  unbiasedness  or  of  invariance  may  be  imposed 
[Lehmann,  1959,  Ch.  4-6].  As  an  alternative,  other  performance  criteria  based 
on  the  power  function  may  be  employed.  We  will  consider  one  such 
criterion,  leading  to  locally  optimum  or  locally  most  powerful  tests  for 
composite  alternatives,  in  the  next  section.  One  approach  to  obtaining 
reasonable  tests  for  composite  hypotheses  is  to  use  maximum-likelihood 
estimates  0^0  *he  parameter  6,  obtained  under  the  constraints  of 

6e  ©Hq  respectively,  in  place  of  ©Hq  iri  (A-8).  The 

resulting  test  is  called  a  generalized  likelihood  ratio  (GLR)  test  or  simply  a 
likelihood  ratio  test. 

B.  LOCAL  OPTIMALITY  AND  THE  GENERALIZED  NEYMAN-PEARSON 

LEMMA 

Let  us  now  consider  the  approach  to  construction  of  tests  for  composite 
alternative  hypotheses.  In  this  approach  attention  is  concentrated  on 
alternatives  8  =  Ony  in  Bny  which  are  close,  in  the  sense  of  a  metric  or 
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distance,  to  the  null-hypothesis  parameter  value  6  =  Ohq-  Specifically,  let  6  be 
a  real-valued  parameter  with  value  6=  do  defining  the  simple  null  hypothesis 
and  let  0  >  6b  define  the  composite  alternative  hypothesis.  Consider  the  class 
of  all  tests  based  on  test  functions  5(x)  of  a  particular  desired  size  a  for  6  =  6o 
against  6  >  6q,  and  assume  that  the  power  function  T{d  \  S)  of  these  tests  are 
continuous  and  also  continuously  differentiable  at  6  =  6o.  Then  if  we  are 
interested  primarily  in  performance  for  alternatives  which  are  close  to  the 
null  hypothesis,  we  can  use  as  a  measure  of  performance  the  slope  of  the 
power  fimction  at  6  =  6o,  that  is 

s>'(eol^)  =  J’'(ei«)|e=eb 

From  among  our  class  of  tests  of  size  a,  the  test  based  on  S'fx)  which 
uniquely  maximizes  5^(6d  I  6)  has  a  power  function  satisfying 

!P'(6I5*)>!P(6I5),  6o<6<6n^x  (^-10) 

for  some  6max  >  Such  a  test  is  called  a  locally  most  powerful  or  locally 
optimum  (LO)  test  for  6  =  63  against  6  >  60.  It  is  clearly  of  interest  in  situations 
such  as  the  weak-signal  case  in  signal  detection,  when  the  alternative- 
hypothesis  parameter  values  of  primary  concern  are  those  which  define  pdf's 
Px(x  I  6)  close  to  the  null-h>TX)thesis  noise-only  pdf  Px(x  I  dn^. 

The  following  generalization  of  the  Neyman-Pearson  fundamental  result 
of  Theorem  1  can  be  used  to  obtain  the  structure  of  an  LO  test: 
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Theorem  2:  Letg(x)  and  h\ix),  /i2(x), /i„(x)  be  real-valued  and  integrable 
functions  defined  on  SR”.  Let  an  integrable  function  Six)  on  SR”  have  the 
characteristics 


8{x)  = 


1 

j=l 

m 

i=l 

i=l 


M-11) 


for  a  set  of  constants  f,  >0,i  =  1,  2, m,  and  where  0  <  r(x)  <  1.  Define,  for 
i  =  1,  2, m,  the  quantities 

a/ =  J5(x)/j,(xyx.  {A -12) 

9?" 


Then  from  within  the  class  of  all  test  functions  satisfying  the  m  constraints 
(A-12),  the  function  Six)  defined  by  (A-11)  maximizes  J5(x)g(x)dx. 

A  more  complete  version  of  the  above  theorem,  and  its  proof,  may  be 
found  in  [Lehmann,  1959,  Ch.  3];  Ferguson  [1967,  Ch.  5]  also  discusses  the  use 
of  this  result. 

To  use  the  above  result  in  finding  an  LO  test  for  6  =  do  against  6  >6o 
defining  Ohq  and  6hj  in  (A-1)  and  (A-2),  respectively,  let  us  write  (A-9) 
explicitly  as 


=  ^  l5(x)P,{xl6>ix 


e=eo 


(A -13) 
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assuming  that  our  pdf's  are  such  as  to  allow  the  interchange  of  the  order  in 
which  limits  and  integration  operations  are  performed.  Taking  m  =  1  and 
identifying  hiix)  with  Px(x  I  6b)  and  g(x)  with  ^Px(xl^)|0=0o  ^  Theorem  2,  we 
are  led  to  the  locally  optimum  test  which  accepts  the  alternative  Hi:  6  >  Oq 


when 


P«(=‘I9)|9=( 

p,(*ieo) 


(/I -14) 


where  t  is  the  test  threshold  value  which  results  in  a  size-a  test  satisfying 


E{5{X)IH:e  =  0o}  =  a. 


(44-15) 


The  test  of  (A-14)  may  also  be  expressed  as  one  accepting  the  alternative 


when 


(41-16) 


Theorem  2  may  also  be  used  to  obtain  tests  maximizing  the  second 
derivative  !P"iOo  I  5)  at  0  =  6o-  This  would  be  appropriate  to  attempt  if  it  so 
happens  that  (P'{6o\8)  =  0  for  all  size-a  tests  for  a  given  problem.  The 
condition  T'i6o  \  5)  =  0  will  occur  if  zero,  assuming  the 

requisite  regularity  conditions  mentioned  above.  In  this  case  Theorem  2  can 
be  applied  to  obtain  the  locally  optimum  test  accepting  the  alternative 
hypothesis  Hi:  0  >  6b  when 


■(»'P)|e=i 


p,{xie) 


(41-17) 
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One  type  of  problem  for  which  Theorem  2  is  useful  in  characterizing 
locally  optimum  tests  is  that  of  testing  6=  Oq  against  the  two-sided  alternative 
hypothesis  0  ^  Bq.  We  have  previously  mentioned  that  one  can  impose  the 
condition  of  unbiasedness  on  the  allowable  tests  for  a  problem.  Unbiasedness 
of  a  size-a  test  for  the  hypotheses  Hq  and  Hi  of  (A-1)  and  (A-2)  means  that  the 
test  satisfies 


!P{B\6)<a,  aliee©Ho 

(.4-18) 

fP(B\S)<a,  all  0€ 

(A -19) 

so  that  the  detection  probability  for  any  6^^  e  ,  is  never  less  than  the  size 

a.  For  the  two-sided  alternative  hypothesis  6  suppose  the  pdf's  Px(x  I  6) 
are  sufficiently  regular  so  that  the  power  functions  of  all  tests  are  tv^uce 
continuously  differentiable  at  6  =  Bq.  Then  it  follows  that  for  any  unbiased 
size-a  test  we  will  have  ®  and  T'iBo  \S)  =  0.  Thus,  the  test  function 

of  a  locally  optimum  unbiased  test  can  be  characterized  by  using  these  two 
constraints  and  maximizing  (F'(Bq  I  S)  in  Theorem  2.  Another  interpretation 
of  the  above  approach  for  the  two-sided  alternative  hypothesis  is  that  the 
quantity  co  =  (B-Bq)'^  may  then  be  used  as  a  measure  of  the  distance  of  any 
alternative  hypothesis  from  the  null  hypothesis  B  =  Bq.  We  have 


_d_ 

do) 


1  d 

2{e-eo)d0 


(/I -20) 
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if  ir(6b  I  5)  is  zero,  for  sufficiently  regular  pdf's  Px(x  I  6).  Thus  if  ^{Oq  I  S)  is  zero 
for  a  class  of  size-a  tests,  then  maximization  of  ^'ido  I  S)  leads  to  a  test  which  is 
locally  optimum  within  that  class. 

In  this  appendix  we  are  concerned  with  problems  where  the  noise  density 
function  P  is  completely  specified,  as  a  special  case  of  the  general  parametric 
problem  where  P  may  have  a  finite  number  of  unknown  parameters  (such  as 
the  noise  variance).  Our  detection  problem  can  be  formulated  as  a  statistical 
hypothesis-testing  problem  of  choosing  between  a  null  hypothesis  Hq  and  an 
alternative  hypothesis  Hi  de<;cribing  the  joint  density  function  Px  of  the 
observation  vector  X,  with 

Ho:  PxW  =  flP(:<i)  (>1-2') 

1  =  1 

n 

H,-.  P,(x)  =  nP(>:.  -fe,),  s  specified,  any  0  >  0.  (A -22) 

1=1 

Here  s  is  the  vector  (si,S2,  ...,  s„)  of  signal  components.  Note  that  we  are 
considering  parametric  hypotheses  which  completely  define  Px  to  within  a 
finite  number  of  unknowm  parameters  (here  with  only  ^  >  0  unknowm  under 
the  alternative  hypothesis).  Let  us  now  proceed  to  obtain  the  structures  of 
tests  for  Ho  versus  H\. 

C  LOCALLY  OPTIMUM  DETECTION  AND  ASYMPTOTIC  OPTIMALITY 
Since  the  alternative  hypothesis  H]  is  not  a  simple  hypothesis,  the  signal 
amplitude  value  being  unsp>ecified,  we  cannot  apply  directly  the  fundamental 
lemma  of  Neyman  and  Pearson  to  obtain  the  structure  of  the  optimum 
detector  for  the  detection  problem.  For  non-Gaussian  noise  densities  it  is  also 
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generally  impossible  to  obtain  UMP  tests  for  the  composite  alternative 
hypothesis  Hi- 

To  illustrate  the  difficulty,  consider  the  special  case  where  P  is  specified  to 
be  the  double-exponential  noise  density  function  defined  by 

P{x)  =  ^e~‘‘^^Ka>0.  {A -23) 

The  likelihood  ratio  for  testing  Hq  versus  Hi  for  a  particular  value  6  =  6  o>0, 
is 


This  now  becomes 


ux) = n 


i=l 


P{Xi  -  gpS,  ) 


-a  I(|x,-eos,|-jx,|) 

L{X)  =  e 


giving 


lnL{X)  =  a5^(|x,|-|x,-0oS,|). 
i=1 

Thus  for  given  6  =  6q,  the  test  based  on 

1=1 


(-4-24) 


(-4-25) 


(-4  -  26) 


(4-27) 


is  an  optimum  test,  since  the  constant  a  is  positive.  The  optimum  detector 
therefore  has  a  test  function  defined  by 


224 


(A-2S) 


5(X)  = 


r 


0 


A{X)>f 
A(X)=< 
A(X)  <  t 


where  the  threshold  t  and  randomization  probability  r  are  chosen  to  obtain 
the  desired  value  for  the  false-alarm  probability  Pp^,  so  that  the  equation 

E{«(X)|H,}  =  Pm  (a -29) 

is  satisfied.  Notice  that  we  do  not  need  randomization  at  A(X)  =  r  if  this 
event  has  zero  probability  under  Hq. 

We  can  express  X(X)  of  (2-19)  in  the  form 

•i(x)  =  ii(x,;eoSj)  (A -30) 

i=l 

where  the  characteristic  /  is  defined  by 

This  is  shown  in  Figure  A.2  as  a  function  of  x  and  depends  strongly  on  6,  so 
that  A(X)  cannot  be  expressed  in  a  simpler  form  decoupling  do  and  the  x,.  For 
an  implementation  of  the  test  statistic  A(X)  the  value  9o  of  6  must  be  known, 
and  a  UMP  test  does  not  exist  for  this  problem  for  n  >  1. 

One  approach  we  might  take  in  the  above  case  is  to  use  a  generalized 
likelihood  ratio  (GLR)  test,  here  obtained  by  using  as  the  test  statistic  A(X)  of 
(A-27)  with  6o  replaced  by  its  maximum  likelihood  (ML)  estimate  under  the 
alternative  hypothesis  H].  This  maximum-likelihood  estimate  6j^i  is  given 
implicitly  as  the  solution  of  the  equation 
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{A -32) 


£s,-  sgn(x,-  -  ^ml  s,)  =  0 


where 


sgn(x)  = 


1 

0 

-1 


X  >  0 

x  =  0 

X  <0 


(A  -  33) 


provided  that  the  solution  turns  out  to  "be  non-negative;  otherwise,  6ml  =  0. 
Thus  the  implementation  of  the  GLR  test  is  not  simple.  In  addition,  the 
distribution  of  the  GLR  test  statistic  under  the  null  hypothesis  is  not  easily 
obtained. 


l(x;es) 


Figure  A.2.  The  Characteristic  of  Equation  A-31 


In  the  general  case,  for  any  noise  density  function  P,  the  optimum  detector  for 
given  0  =  00  >  0  under  Hi  can  be  based  on  the  test  statistic 


( 
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A(X)  =  lnL(X) 


n 


=  Xln 


(/I -34) 


which  is  of  the  form  of  A(X)  of  (A-30).  But  again,  6b  must  be  specified  and  the 
detector  will  be  optimum  only  for  a  signal  with  that  amplitude.  The  GLR 
detector  can  be  obtained  if  the  ML  estimate  ^ML  of  6  can  be  found  under  the 

A 

constraint  that  d^i  be  non-negative.  Once  again,  in  general  this  will  not  lead 
to  an  easily  implemented  and  easily  analyzed  system. 


D.  LOCALLY  OPTIMUM  DETECTORS 

The  above  discussion  shows  that  we  have  to  search  further  in  order  to 
obtain  reasonable  schemes  for  detection  of  a  known  signal  of  unspecified 
amplitude  in  additive  non-Gaussian  noise.  By  a  "reasonable"  scheme  we 
mean  a  detector  that  is  practical  to  implement  and  relatively  easy  to  analyze 
for  performance,  w’hich  should  be  acceptable  for  the  anticipated  range  of  input 
signal  amplitudes.  Fortunately,  there  is  one  performance  criterion  with 
respect  to  w'hich  it  is  possible  to  derive  a  simple  and  useful  canonical 
structure  for  the  optimum  detector  for  our  detection  problem.  This  is  the 
criterion  of  local  detection  pwwer,  and  leads  to  detectors  which  are  said  to  be 
locally  optimum. 

A  locally  optimum  (LO)  or  locally  most  powerful  detector  is  one  which 
maximizes  the  slope  of  the  detector  power  function  at  the  origin  {6  =  0),  from 
among  the  class  of  all  detectors  which  have  its  false  alarm  probability.  Let  Aa 
be  the  class  of  detectors  of  size  a  for  Hq  versus  H\.  In  our  notation  any 
detector  D  in  da  is  based  on  a  test  function  5(X)  for  which 
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(/I -35) 


E{«(X)|H,}  =  a. 

Let  (PdidlD)  be  the  power  function  of  detector  D,  that  is, 

Td{e\D)  =  E{SiX)\Hi}.  {A -36) 

Fonnally,  an  LO  detector  Dio  of  size  a  is  a  detector  in  Aa  v'hich  satisfies 


max  — (Pd{6\D)\ 
DeAa  dO 


e=o 


e=o 


{A  =  37) 


It  would  be  appropriate  to  use  a  locally  optimum  detector  when  one  is 
interested  primarily  in  detecting  weak  signals,  for  which  6  under  the 
alternative  hypothesis  IJ]  remains  close  to  zero.  The  idea  is  that  an  LO 
detector  has  a  larger  slope  for  its  power  function  at  0  =  0  th  m  any  other 
detector  D  of  the  same  size  which  is  not  an  LC  detector,  and  this  will  ensure 
that  the  pxiwer  of  the  LO  detector  will  be  larger  than  that  of  the  other  detector 
at  least  for  6  in  some  non-null  interval  (O,0max)/  vv'ith  ©max  depending  on  D. 
This  is  illustrated  in  Figure  A. 3.  Note  that  if  an  LO  detector  is  not  unique, 
then  one  may  be  better  than  another  for  6  >  0.  There  is  good  reason  to  be 
conCv  ned  primarily  with  weak-signal  detection.  It  is  the  weak  signal  that  one 
has  the  most  difficulty  in  detecting,  whereas  most  ad  hoc  detection  schemes 
should  perform  adequately  for  strong  signals;  after  all,  the  detection 
probability  is  upper  bounded  by  unity. 


Figure  A.3.  Power  Sanctions  of  Optimum  and  LO  Detectors 

To  obtain  explicitly  the  canonical  form  of  the  LO  detector,  we  can  apply 
the  generalized  Neyman-Pearson  lemma  of  Section  A. 2.  Now  the  power 
function  of  a  detector  D  based  on  a  test  function  5(X)  is 

tP^(0ID)=  j  5{x)nP(x,-fe,)ix  (A-38) 

SR”  '=1 

where  tue  integration  is  over  the  n-dimensional  Euclidean  space  9\".  The 
regularity  Assumptions  allow  us  to  get 
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dd 


uem 


£)=0 


=  1 

3?”  i=l 


dx 


6r-.0 


\  «(x) 


h  ’’M 


1=1 


{A -39) 


from  this  it  follows,  from  the  generalized  Ney man -Pear son  lemma,  that  a 
locally  optimum  detector  D/t,  is  based  on  the  test  statistic 


-itoCx)  =  - 


P'fc) 

PM 


=  1,^1  SLkM 


J=1 


where  g/^,  is  the  function  defined  b)’ 


StcM  =  - 


P'W 
PM  ^ 


Note  that  we  may  express  A/AX)  as 


e=o 


(A -40) 


(A-41) 


Ayin^(^i-^s,) 

i=l 


d6f^r  P(xi) 


6=0 


(A -42) 


from  which  the  LO  detector  test  statistic  (multiplied  by  6)  is  seen  to  be  a  first- 
order  approximation  of  the  optimum  detector  test  statistic  given  by  A-34. 
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For  the  double-exponential  noise  density  of  (A-23)  we  find  that  g/o  is 
given  by 

gioix)  =  asgn{x).  (A -43) 

Note  that  the  optimum  detector  for  0  =  0o  in  this  case  is  based  on  the  test 
statistic  A{X)  of  (A-27).  Similarly,  for  a  zero  mean  Gaussian  density  with 
variance  we  have 

gto(x)  =  (A-44) 
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